Skip to main content
Tech Radar
  • Tech Radar Pro
  • Tech Radar Gaming
Tech Radar Pro TechRadar IT Insights for Business
Subscribe
RSS
(opens in new tab) (opens in new tab) (opens in new tab) (opens in new tab)
Asia
flag of Singapore
Singapore
Europe
flag of Danmark
Danmark
flag of Suomi
Suomi
flag of Norge
Norge
flag of Sverige
Sverige
flag of UK
UK
flag of Italia
Italia
flag of Nederland
Nederland
flag of België (Nederlands)
België (Nederlands)
flag of France
France
flag of Deutschland
Deutschland
flag of España
España
North America
flag of US (English)
US (English)
flag of Canada
Canada
flag of México
México
Australasia
flag of Australia
Australia
flag of New Zealand
New Zealand
Technology Magazines
(opens in new tab)
Technology Magazines (opens in new tab)
Why subscribe?
  • The best tech tutorials and in-depth reviews
  • Try a single issue or save on a subscription
  • Issues delivered straight to your door or device
From$12.99
(opens in new tab)
View (opens in new tab)
  • News
  • Reviews
  • Features
  • Opinions
  • Website builders
  • Web hosting
  • Security
Trending
  • Best standing desk deals
  • Best cloud storage 2023
  • What is Microsoft Teams?
  • Windows 11 for business

When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

  1. Home
  2. News
  3. Computing

AWS is the supercomputing heavyweight you didn’t know existed

By Joel Khalili
published 21 March 2022

AWS believes performance is not necessarily the most important metric

cloud
(Image credit: Shutterstock / Valery Brozhinsky)

Many people likely know AWS as the world’s largest provider of cloud computing services. But far fewer people think of the Amazon subsidiary as a supercomputing heavyweight.

Predominantly, that’s because AWS is happy to operate at the less sexy end of the high-performance computing (HPC) spectrum, far from the shiny proof-of-concept systems that decorate the summit of the Top 500 (opens in new tab) rankings.

Instead, the organization is concerned with democratizing access to supercomputing resources, by making available cloud-based services that a large number of companies and academic institutions can access.

Amazon

(Image credit: Amazon)

One of the people responsible for delivering on this objective is Brendan Bouffler, known to some as “Boof”, who in his role as Head of Developer Relations for HPC acts as the intermediary between customers and the AWS engineering team.

As someone with years of experience building supercomputers, he contends that, counter-intuitively, it is often the smaller-scale machines that have the greatest impact, because raw performance is not necessarily the most important metric.

“It is fun to design really big machines, because it’s a complex mathematical problem you have to dismantle,” he told us. “But I always got more joy out of building the smaller systems, because that’s where the largest amount of science is done.”

The epiphany at AWS was that this approach to HPC, whereby productivity takes precedence over performance, could be transplanted effectively into the cloud.

HPC in the cloud

Although large-scale supercomputers like Fugaku, which currently tops the performance rankings, are excellent examples of how far the latest hardware can be pushed, these systems are curiosities first and utilities only second.

As Bouffler explains, the fundamental problem with large on-premise machines is ease of access. A system capable of breaking the exascale barrier would be an impressive feat of engineering, but of diminished practical use if researchers must queue for weeks to use it.

“A lot of people who build supercomputers, myself included, fall into the trap of worrying about squeezing out an additional 1% of performance. It’s laudable on one level, but the obsession means it's easy to miss the low-hanging fruit,” Bouffler told us.

“What’s more important is the cadence of research; that’s actually where the opportunity is for the scientific community.”

Fugaku supercomputer

Fugaku, the world's most powerful supercomputer. (Image credit: RIKEN)

As such, the AWS approach is as much about availability and elasticity as it is performance. With the company’s as-a-service offerings, customers are able to launch their HPC workloads instantaneously in the cloud and scale the allocated resources up or down as required, all but eliminating wastage.

“It’s about creating highly equitable access,” said Bouffler. “If you’ve got the budget and desire to solve a problem, you’ve got the computing resources you need.”

The benefits of such a system have been particularly evident since the start of the pandemic, during which companies like Moderna and AstraZeneca have used AWS instances for the purposes of vaccine development.

According to Bouffler, the world may not have a vaccine today (let alone multiple) without cloud-based HPC, which allowed for research to be kickstarted swiftly and scaled up at a moment’s notice.

“The researchers we worked with wanted flexibility and raw capacity on tap. If you make computing invisible and put the power in the hands of the people with the smart ideas, they can do really powerful things.”

Our data center, our silicon, our rules

Bouffler is the first to admit that the HPC community doesn’t pay a great deal of attention to what’s going on inside AWS. But he insists there is plenty of innovation coming out of the organization.

Historically, for example, cloud-based instances have excelled at running so-called “embarrassingly parallel” workloads that can be divided easily into a high volume of distinct tasks, but perform less well when communication between nodes is required.

Instead of bringing InfiniBand to the cloud, AWS came up with a different way to solve the problem. The company developed a technology called Elastic Fabric Adapter (opens in new tab) (EFA), which supposedly enables application performance on-par with on-premise HPC clusters for complex workloads like machine learning and fluid dynamics simulation.

Unlike InfiniBand, which fires all data packets from A to B down the fastest possible route, EFA spreads the packets thinly across the entire network.

“We had to find a way of running HPC in the cloud, but didn’t want to go and make the cloud look like an HPC cluster. Instead, we decided to redesign the HPC fabric to take advantage of the attributes of the cloud,” Bouffler explained.

“EFA sprays the packets like a swarm across virtually all pathways at once, which yields as good, if not better, performance. The scaling doesn’t stop when the network gets congested, either; the system assumes there is congestion from the outset, so the performance remains flat even as the HPC job gets larger.”

AWS reinvent adam selipsky

AWS CEO Adam Selipsky, on stage at AWS re:Invest 2021. (Image credit: Future / Mike Moore)

In 2018, meanwhile, AWS announced it would begin developing its own custom Arm-based server processor, called Graviton. Although not geared exclusively towards HPC use cases, the Graviton series has opened a number of doors for AWS, because it allowed the company to rip out all the features that weren’t essential to its needs and double down on those that were.

“When you’re designing something as big as a cloud, you have to assume things are going to fail,” said Bouffler. “Generally speaking, removing unnecessary features means you have much closer control over the failure profile, and having control over the silicon has given us a similar advantage.”

“Graviton3 is optimized up the wazoo for our data centers, because we’re the only customer for these things. We know what our conditions are, whereas other manufacturers have to support the most weird and unusual data center configurations.”

At AWS re:Invent last year, attended by TechRadar Pro, the company launched new EC2 instances powered by Graviton3, which is said to deliver up to 25% better compute performance and 60% better power efficiency than the previous generation, at least in some scenarios.

There are also a number of HPC-centric features built into Graviton3, such as 300GB/sec memory bandwidth, that typical enterprise workloads would never stretch to the limit, Bouffler explained. “We’re pushing in every direction for HPC, that’s what we always do.”

The more HPC, the merrier

Asked where AWS will take its HPC services next, Bouffler quoted a favorite saying of Jeff Besoz: “No customer has ever asked for less variety and higher prices”.

Moving forward, then, Bouffler and his team will continue to sound out customers and work to offer a wider variety of instances to address their specific needs, with a wider range of hardware options.

Another focus will be on bringing down the cost of running HPC workloads in the cloud. With this objective in mind, AWS launched a new AMD EPYC Milan-based EC2 instance in January called Hpc6a (opens in new tab), which is two-thirds cheaper than the nearest comparable x86-based equivalent. Bouffler says AWS did “all sorts of nutty things” to help bring down the cost.

It’s not just about academic and scientific use cases, either. AWS is working with a diverse range of companies, from Western Digital to Formula One, to help accelerate product design, and hopes to expand into a deeper range of industries in future.

“We’re getting HPC into every nook and cranny of the economy,” added Bouffler. “And the more the merrier.”

  • Also check out our lists of the best bare metal hosting and best dedicated server hosting services around

Are you a pro? Subscribe to our newsletter

Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!

By submitting your information you agree to the Terms & Conditions (opens in new tab) and Privacy Policy (opens in new tab) and are aged 16 or over.
Joel Khalili
Joel Khalili
Social Links Navigation
News and Features Editor

Joel Khalili is the News and Features Editor at TechRadar Pro, covering cybersecurity, data privacy, cloud, AI, blockchain, internet infrastructure, 5G, data storage and computing. He's responsible for curating our news content, as well as commissioning and producing features on the technologies that are transforming the way the world does business.

See more Computing news

TechRadar is part of Future US Inc, an international media group and leading digital publisher. Visit our corporate site (opens in new tab).

  • About Us (opens in new tab)
  • Contact Us (opens in new tab)
  • Terms and conditions (opens in new tab)
  • Privacy policy (opens in new tab)
  • Cookies policy (opens in new tab)
  • Advertise with us (opens in new tab)
  • Web notifications (opens in new tab)
  • Accessibility Statement
  • Careers (opens in new tab)

© Future US, Inc. Full 7th Floor, 130 West 42nd Street, New York, NY 10036.