The next phase of LLM development: Why the future of sovereign AI will be multilingual by design

Hacker with malware code in computer screen. Cybersecurity, privacy or cyber attack. Programmer or fraud criminal writing virus software. Online firewall and privacy crime. Web data engineer — (Image credit: Shutterstock)

The first wave of large language models (LLMs) transformed how the world interacts with technology. In just a few years, generative AI moved from experimental labs to boardrooms, powering enterprise copilots, digital assistants, and intelligent automation at scale.

Yet beneath this rapid progress lies a structural limitation that is becoming increasingly visible as AI adoption expands globally: most foundational models are built around an English-first architecture.

Harshul Asnani

President and Head of the Europe business, Tech Mahindra.

For the early phase of generative AI, this design bias was understandable. Much of the publicly available training data on the internet is English-dominated, and early model development was concentrated in regions where English served as the primary interface for digital communication.

The Structural Limits of English-First AI

Today’s most widely used LLMs can technically operate in dozens of languages. Yet multilingual capability does not necessarily mean multilingual understanding. In many cases, these models translate knowledge from English rather than reasoning natively within different linguistic structures.

This distinction matters.

Language is not simply a medium of communication; it encodes culture, context, social nuance, and local knowledge systems. When models are trained predominantly on English-centric datasets, they risk overlooking large segments of the global digital economy from regional commerce and governance frameworks to community knowledge and local dialects.

For enterprises operating across global markets, this creates tangible limitations. Customer engagement, financial services, healthcare delivery, and government services often rely on contextual understanding of local language variations. When AI systems struggle to interpret these nuances, the result is reduced accuracy, limited adoption, and diminished trust.

As AI becomes a foundational layer of IT infrastructure, models must move beyond translation toward native linguistic reasoning. This shift represents one of the defining engineering challenges of the next generation of AI systems.

Architecting Multilingual Foundations

Building truly multilingual foundation models requires more than expanding language coverage. It demands a different architectural philosophy.

Training datasets must incorporate diverse linguistic ecosystems, including regional languages and dialects that may not have historically been represented in digital corpora. This involves collaboration across academia, governments, and industry to curate high-quality, ethically sourced datasets that reflect real-world linguistic diversity.

Model architectures themselves must evolve to support efficient representation of multiple linguistic systems. Techniques such as mixture-of-experts architectures, specialized tokenization strategies, and language-specific reasoning pathways emerging as powerful approaches to enable scalable multilingual intelligence.

Evaluation frameworks must be redesigned. Traditional AI benchmarks often prioritize English-language tasks, which can obscure performance gaps across other languages. New evaluation standards must measure reasoning, contextual understanding, and cultural relevance across multilingual environments.

Taken together, these shifts represent a broader transition in how AI systems are conceived from global models optimized for a single dominant language to distributed intelligence systems designed for linguistic plurality.

Sovereign AI and the Rise of National AI Ecosystems

Parallel to the architectural evolution of LLMs, governments around the world are increasingly focusing on the concept of sovereign AI.

At its core, sovereign AI refers to a nation’s ability to develop, deploy, and govern AI systems that reflect its own linguistic, cultural, and regulatory context. This includes control over data infrastructure, alignment with national regulatory frameworks, and the cultivation of domestic innovation ecosystems.

Several factors are driving this shift.

AI systems rely heavily on data that may be sensitive or jurisdictionally restricted. Governments and enterprises alike are seeking greater assurance around data residency and governance, particularly in sectors such as finance, healthcare, and public services.

AI is rapidly becoming a strategic capability that influences economic competitiveness, technological sovereignty, and national security. It is here that linguistic representation plays a critical role in ensuring inclusive AI adoption. Nations with diverse linguistic landscapes must ensure that AI systems can serve citizens in their native languages.

As a result, sovereign AI initiatives are emerging across multiple regions, with investments spanning national compute infrastructure, open data ecosystems, and localized AI model development.

Lessons from India’s AI Stack

Among the most compelling examples of this evolution is the growing momentum around India’s digital public infrastructure and AI ecosystem.

India’s digital transformation over the past decade has demonstrated how technology platforms designed with inclusivity at their core can scale to serve hundreds of millions of users.

Initiatives such as digital identity management systems, open financial networks, and interoperable public platforms have created a foundation that enables innovation at population scale.

This model offers important lessons for the future of AI.

Digital infrastructure built around open standards encourages ecosystem participation. When governments, startups, and enterprises collaborate on shared technology frameworks, innovation accelerates far beyond what individual organizations can achieve independently.

Linguistic diversity must be embedded into the design of AI platforms from the outset. India’s vast landscape of languages and dialects requires AI systems capable of operating across multiple linguistic contexts simultaneously.

The success of digital platforms depends on trust. Transparent governance models, data protection frameworks, and inclusive access mechanisms ensure that technology benefits are widely distributed.

As countries across Europe and the United Kingdom develop their own sovereign AI strategies, these principles open infrastructure, multilingual capability, and collaborative ecosystems are likely to play an increasingly important role.

The Road Ahead: From Global Models to Global-Local Intelligence

The future of AI will not be shaped solely by the scale of models or the size of training datasets. Instead, the defining advantage will belong to organizations and nations that can design AI systems capable of operating across diverse linguistic, cultural, and regulatory environments.

This requires a shift from viewing AI as a universal technology toward recognizing it as a globally interoperable but locally contextual system.

Multilingual architectures will enable AI to reason within regional contexts rather than merely translating across them. Sovereign AI frameworks will ensure that data governance and infrastructure align with national priorities. And collaborative ecosystems will allow innovation to emerge from multiple regions rather than a handful of technology hubs.

In many ways, this mirrors the evolution of the internet itself. What began as a network built around a few dominant regions eventually became a globally distributed platform supporting billions of users and countless local ecosystems.

AI is now entering a similar phase.

The next generation of large language models will be designed from the ground up to understand them. And in doing so, they will unlock a new era of inclusive, sovereign, and globally connected intelligence.

Keep your data in the cloud with the best cloud storage.

TOPICS