Microsoft, OpenAI may have solved a fundamental AI bottleneck

A profile of a human brain against a digital background.
(Image credit: Pixabay)

Microsoft and Open AI have developed a new method for optimizing massive AI models that are too expensive to train multiple times, such as GPT-3.

A blog post published by Microsoft Research describes a technique called µ-Parametrization (or µP), which plays on the discovery of similarities between the behaviour of small- and large-scale AI models to minimize the quantity of compute resources required to make optimizations.

Although you’d need a doctorate to make sense of the specifics, the essential message is this: with µ-Parametrization, it will be cheaper and simpler to develop larger-scale AI models capable of yielding far superior performance to those available today.

Optimizing AI models

As explained in the blog post, one reason large AI models are difficult to train effectively is because we have little insight into the way their behavior changes as they scale. As such, the larger the AI model, the less well-tuned researchers would currently expect it to be.

However, µ-Parametrization offers a route to tuning large-scale models at much lower costs and much greater efficiency, by capitalizing on the insight that neural networks of varying sizes share the same optimal hyperparameters (HPs) in some conditions.

Essentially, this means a small-scale tuning process can be extrapolated outwards and mapped onto a much larger model, instead of tuning an entire multi-billion-parameter model directly.

“µP’s principled way of parameterizing the model and selecting the learning rate make it easier for anybody to scale the training of deep neural networks. Such an elegant combination of beautiful theory and practical impact,” said Johannes Gehrke, Lab Director at Microsoft Research.

To put the theory into practice, Microsoft worked with OpenAI to unleash µ-Parametrization on GPT-3, a natural language model whose largest iteration is made up of 175 billion parameters.

“After parameterizing a version of GPT-3 with relative attention in µP, we tuned a small proxy model with 40 million parameters before copying the best hyperparameter combination to the 6.7-billion parameter variant of GPT-3,” Microsoft explained.

The results were quite startling; the collaborators managed to create an even more performant version of GPT-3, using just 7% of the compute power consumed in the pretraining of the 6.7-billion parameter model.

To help other practitioners benefit from these findings, Microsoft has published a PyTorch package designed to help integrate µ-Parametrization into their existing models, which can supposedly be finicky in practice.

The company also says there remains plenty that is yet to be understood about the scaling of AI models, however, and pledged to continue its work to “derive more principled approaches to large-scale machine learning”.

Joel Khalili
News and Features Editor

Joel Khalili is the News and Features Editor at TechRadar Pro, covering cybersecurity, data privacy, cloud, AI, blockchain, internet infrastructure, 5G, data storage and computing. He's responsible for curating our news content, as well as commissioning and producing features on the technologies that are transforming the way the world does business.

Read more
A profile of a human brain against a digital background.
Navigating the rising costs of AI inference in the era of large-scale applications
An AI face in profile against a digital background.
Five pillars for practical GenAI implementation
Data center racks with cables and servers
The tipping point for AI and Managed Cloud
An AI face in profile against a digital background.
Navigating transparency, bias, and the human imperative in the age of democratized AI
Image of someone clicking a cloud icon.
Unified data means faster AI: Here’s how to unleash its potential
ChatGPT deep research
OpenAI reveals its most powerful tool yet, designed for "deep research"
Latest in Pro
cybersecurity
What's the right type of web hosting for me?
Security padlock and circuit board to protect data
Trust in digital services around the world sees a massive drop as security worries continue
Hacker silhouette working on a laptop with North Korean flag on the background
North Korea unveils new military unit targeting AI attacks
An image of network security icons for a network encircling a digital blue earth.
US government warns agencies to make sure their backups are safe from NAKIVO security issue
Laptop computer displaying logo of WordPress, a free and open-source content management system (CMS)
This top WordPress plugin could be hiding a worrying security flaw, so be on your guard
construction
Building in the digital age: why construction’s future depends on scaling jobsite intelligence
Latest in News
Quordle on a smartphone held in a hand
Quordle hints and answers for Sunday, March 23 (game #1154)
NYT Strands homescreen on a mobile phone screen, on a light blue background
NYT Strands hints and answers for Sunday, March 23 (game #385)
NYT Connections homescreen on a phone, on a purple background
NYT Connections hints and answers for Sunday, March 23 (game #651)
Google Pixel 9 Pro Fold main display opened
Apple is rumored to be prioritizing battery life on the foldable iPhone – which could also feature a liquid metal hinge for added durability
Google Pixel 9
The Google Pixel 10 just showed up in Android code – and may come with a useful speed boost
L-mount alliance
Sirui joins L-Mount Alliance to deliver its superb budget lenses for Leica, DJI, Sigma and Panasonic cameras