Cerebras Systems, maker of the world’s largest processor, has broken the record for the most complex AI model trained using a single device.
Using one CS-2 system, powered by the company’s wafer-sized chip (WSE-2), Cerebras is now able to train AI models with up to 20 billion parameters thanks to new optimizations at the software level.
The firm says the breakthrough will resolve one of the most frustrating problems for AI engineers: the need to partition large-scale models across thousands of GPUs. The result is an opportunity to drastically cut the time it takes to develop and train new models.
Cerebras brings AI to the masses
In sub-disciplines like natural language processing (NLP), the performance of the model correlates in a linear fashion with the number of parameters. In other words, the larger the model, the better the end result.
Today, developing large-scale AI products traditionally involves spreading a model across a large number of GPUs or accelerators, either because there are too many parameters to be housed within memory or compute performance is insufficient to handle training workloads.
“This process is painful, often taking months to complete,” explained Cerebras. To make matters worse, the process is unique to each network compute cluster pair, so the work is not portable to different compute clusters, or across neural networks. It is entirely bespoke.”
Although the most complex models consist of many more than 20 billion parameters, the ability to train relatively large-scale AI models on a single CS-2 device eliminates these bottlenecks for many, accelerating development for existing players and democratizing access for those previously unable to participate in the space.
“Cerebras’ ability to bring large language models to the masses with cost-efficient, easy access opens up an exciting new era in AI. It gives organizations that can’t spend tens of millions an easy and inexpensive on-ramp to major league NLP,” said Dan Olds, Chief Research Officer, Intersect360 Research.
“It will be interesting to see the new applications and discoveries CS-2 customers make as they train GPT-3 and GPT-J class models on massive datasets.”
What’s more, Cerebras hinted that its CS-2 system may be able to handle even larger models in future, with “even trillions of parameters”. And chaining together multiple CS-2 systems, meanwhile, could pave the way for AI networks larger than the human brain.