IBM has built a cost-effective AI supercomputer in its cloud

HPC
(Image credit: Shutterstock / Connect World)

IBM’s answer to the cost-effective supercomputer has already been up and running for several months now, but only recently has it disclosed any tangible information about its so-called Vela project.

Turning to its blog to discuss details, IBM revealed that the research, authored by five employees at the company, tackles the problems with previous supercomputers, and their lack of readiness for AI tasks.

In order to tweak the supercomputer model for this future type of workload, the company sheds some light on the decisions it made in terms of the use of affordable but powerful hardware.

TechRadar Pro needs you!

We want to build a better website for our readers, and we need your help! You can do your bit by filling out our survey and telling us your opinions and views about the tech industry in 2023. It will only take a few minutes and all your answers will be anonymous and confidential. Thank you again for helping us make TechRadar Pro even better.

D. Athow, Managing Editor

IBM's Vela AI supercomputer

The work highlights that “building a [traditional] supercomputer has meant bare metal nodes, high-performance networking hardware… parallel file systems, and other items usually associated with high-performance computing (HPC).” 

While it’s clear that these supercomputers can handle heavy AI workloads, including the one designed for OpenAI, the startup behind the popular ChatGPT live chat software, a lack of optimization has meant that traditional supercomputers could lack valuable power, and have an excess in other areas leading to an unnecessary spend.

While it has long been accepted that bare metal nodes are the most ideal for AI, IBM wanted to explore offering these up inside of a virtual machine (VM). The result, according to Big Blue, is huge performance gains.

“Following a significant amount of research and discovery, we devised a way to expose all of the capabilities on the node (GPUs, CPUs, networking, and storage) into the VM so that the virtualization overhead is less than 5%, which is the lowest overhead in the industry that we’re aware of.”

In terms of node design, Vela is packed with 80GB or GPU memory, 1.5TB of DRAM, and four 3.2TB NVMe storage drives.

The Next Platform estimates that, if IBM wanted to feature its supercomputer in the Top500 rankings, it would deliver around 27.9 petaflops of performance, placing it in 15th place according to November 2022’s rankings. 

While today’s supercomputers are currently able to handle AI workloads, huge developments in artificial intelligence combined with the pressing need for cost efficiency highlight the need for such a machine.

Craig Hale

With several years’ experience freelancing in tech and automotive circles, Craig’s specific interests lie in technology that is designed to better our lives, including AI and ML, productivity aids, and smart fitness. He is also passionate about cars and the decarbonisation of personal transportation. As an avid bargain-hunter, you can be sure that any deal Craig finds is top value!