'A transformative moment': Research shows AI could become the "King of Babel" as LLMs master rare, obscure languages
AI labs are shifting focus toward global language coverage after English saturation
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
You are now subscribed
Your newsletter sign-up was successful
Join the club
Get full access to premium articles, exclusive features and a growing list of member rewards.
- AI models now perform strongly in obscure languages with minimal training data
- Cross-lingual transfer allows shared patterns to boost rare language performance
- Tokenizer efficiency improvements significantly impact multilingual processing cost and quality
Large language models (LLMs) are closing the global language gap at an unexpected pace, with frontier models now performing well in rare languages that previous generations struggled with.
According to RWS's TrainAI Multilingual LLM Synthetic Data Generation Study, Google's Gemini Pro achieved high-quality scores above 4.5 out of 5 in Kinyarwanda, a language spoken by about 12 million people in Rwanda, Uganda, and the DRC.
"This study signals a transformative moment that's not about replacing human expertise, but about elevating it with the right technology," said Vasagi Kothandapani, CEO of TrainAI by RWS.
Article continues belowHow LLMs learn languages with limited training data
Unlike the Biblical "Tower of Babel," where a sudden confusion of tongues halted construction, AI now appears to be dismantling linguistic barriers that once seemed insurmountable.
Tomáš Burkert, Head of Innovation at TrainAI, explained that AI tools often share statistical patterns across languages.
Frontier models do not need massive datasets for each language to produce reliable outputs because cross-lingual transfer allows shared knowledge to compensate for limited training data.
The RWS team also documented improvements in tokenizer efficiency, which affects how efficiently models process text in any given language.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
These improvements compound with other model advancements into meaningful performance gains for rare and obscure languages.
Burkert's team identified "benchmark drift," where LLM capabilities can unexpectedly shift from one version to the next.
For example, the latest version of GPT fell behind smaller models on several content generation tasks, even though its predecessor had been competitive on those same tasks.
Tokenizer efficiency also varied widely between model generations, with one model proving 3.5 times more cost-effective than another in certain languages.
This means enterprises cannot rely on past performance when choosing which model to deploy for multilingual applications.
Until recently, AI labs prioritized performance in English and a handful of major languages, but now models have improved in those areas, some labs are starting to prioritize global audiences, and experts expect more labs to follow.
Successful enterprise AI strategies require continuous validation built on high-quality, culturally nuanced data rather than public leaderboards.
That said, a score of 4.5 out of 5 on a synthetic benchmark does not guarantee real-world fluency, and multilingual data are not really a focus.
According to Burkert, AI labs are only turning to multilingual data partly because labs have likely exhausted high-quality English sources.
Still, by dismantling language barriers, AI proves itself as a true "King of Babel" — not one who built a tower, but one who tore down the walls that divided human speech.
At the moment, the crown obviously does not fit perfectly, but the direction and ideas are very clear.
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.