Nvidia and Microsoft are building a supercomputer in the cloud

supercomputer
(Image credit: Shutterstock / Timofeev Vladimir)

NVIDIA and Microsoft are collaborating on a new cloud-based AI-focused supercomputer, which they claim will be "one of the most powerful in the world" when complete.

The new machine will leverage the supercomputing infrastructure of Microsoft Azure combined with NVIDIA GPUs, networking, and AI software. It's set to contain ND- and NC-series virtual machines specifically designed for AI distributed training and inference. 

The companies claim the project represents the first public cloud to incorporate NVIDIA’s full AI stack and will add tens of thousands of NVIDIA A100 and H100 GPUs, NVIDIA Quantum-2 400Gb/s InfiniBand networking, and the NVIDIA AI Enterprise software suite to its platform.

How will it be used?

The firms said the new machine will be used to help enterprises train, deploy and scale AI, including large models.

NVIDIA is also set to utilize Azure’s scalable virtual machine instances to research and further advances in generative AI. 

This is an emerging area of AI in which foundational models like Megatron Turing NLG 530B provide the basis for unsupervised, self-learning algorithms to create new text, code, digital images, video or audio.

The companies will also collaborate to optimize Microsoft’s DeepSpeed deep optimization software and NVIDIA’s full stack of AI workflows and software development kits, optimized for Azure, will be made available to Azure enterprise customers.

“AI technology advances as well as industry adoption are accelerating. The breakthrough of foundation models has triggered a tidal wave of research, fostered new startups and enabled new enterprise applications,” said Manuvir Das, vice president of enterprise computing at NVIDIA.

It's not just Microsoft that is looking towards Nvidia to power its latest AI innovations.

Oracle and Nvidia announced a collaboration at Oracle Cloud World 2022. It will see tens of thousands of Nvidia GPUs, such as the A100 and upcoming H100, supporting Oracle Cloud Infrastructure (OCI).

Will McCurdy has been writing about technology for over five years. He has a wide range of specialities including cybersecurity, fintech, cryptocurrencies, blockchain, cloud computing, payments, artificial intelligence, retail technology, and venture capital investment. He has previously written for AltFi, FStech, Retail Systems, and National Technology News and is an experienced podcast and webinar host, as well as an avid long-form feature writer.