Over the past two years, enterprises rushed to implement artificial intelligence (AI) across their operations. The initial excitement has given way to a harder reality: Most organizations aren’t seeing the return on investment they expected, despite significant time, budget, and executive attention devoted to AI initiatives.

Part of the reason is fundamental. Meta, OpenAI, Anthropic and other major players continue racing to build ever-larger foundation models trained on ever-expanding datasets. For many enterprise use cases, this approach misses the point and diverts focus from the operational specificity companies actually need.

Large language models trained on public data don’t understand a company’s proprietary processes, validated procedures, or documentation. A manufacturer’s quality control protocols. A bank’s risk assessment frameworks.

A healthcare system’s clinical decision pathways. This is the knowledge that determines whether AI delivers value in an organization, and it’s precisely what general-purpose models lack in day-to-day enterprise environments.

Bigger foundation models won’t help. Domain-specific language models, smaller models trained intensively on an enterprise’s data rather than the entire internet, are the future of getting more value from AI.

Stanford researchers believe that we’ve reached a turning point where carefully curated datasets and smaller models are outperforming massive ones.

Analysts also believe this is how companies will gain value from AI going forward. Gartner predicts that by next year, more than 50% of the AI models enterprises use will be domain- or company-specific, up from only 1% in 2023.

How domain-specific models work

Implementation has become more practical than most organizations realize. It starts with a base model, typically one with 50 to 70 billion parameters, where language proficiency reaches a critical threshold.

At this size, the model already understands language structure well enough to serve as a strong foundation for enterprise adaptation.

From there, you train it on your enterprise documentation. The model learns not just to retrieve information from these documents but to reason about them in ways that reflect your operational reality and internal standards.

The training approach combines retrieval augmented generation with fine-tuning. Your system can query specific documents while the model’s underlying understanding evolves to match your domain.

When subject matter experts correct responses through the interface, those corrections feed back through reinforcement learning. The model improves with each interaction and becomes more aligned with enterprise expectations.

But the enterprise model is just the starting point. From there, organizations can create persona-based models. Business analysts get one version. Engineers get another. Testers get their own. Each persona model builds on the enterprise foundation but specializes further for specific roles and recurring responsibilities.

The final layer is individual customization. Each person can train their version of the model on their specific workflows, priorities, and working style. Think of it as a hyper-personalized assistant that understands both your company’s operations and how you personally work within them.

The feedback you provide continues refining the model to match your needs and improve relevance over time.

This three-layer approach, enterprise, persona, individual, only works because these models run on smaller footprints. Training runs that cost $10,000 to $20,000 each make individual customization economically impossible. Smaller models trained on curated enterprise data change that equation and make sustained iteration feasible.

Where to start

The path forward starts with understanding what proprietary data you actually have. Look at your repositories, your knowledge bases, the technical materials that live behind your firewall.

Then identify use cases where accuracy creates immediate value, areas where generic responses create operational risk or where precision directly affects outcomes and measurable performance.

There’s a reason to prioritize building on your own data rather than fine-tuning someone else’s model with your requirements added afterward.

When you train on your specific documentation from the ground up, the model’s understanding reflects your operational reality rather than trying to retrofit generic knowledge or assumptions that may not apply.

The foundation layer matters more than most organizations realize. You can’t skip from basic prompting to autonomous agents and expect reliable results. Agentic AI frameworks like AutoGPT and LangChain depend entirely on the underlying models’ knowledge.

If those base models lack domain expertise, the autonomous agents built on top won’t have it either. Trust in AI decision-making requires that the underlying intelligence understands what it’s operating on and the context in which decisions are made.

Start with narrow implementations. Test against clear metrics. Scale based on measurable results rather than aspirational roadmaps.

This year, we’re likely to see a separation between enterprises that invested in models trained on their actual operations and those that continued pursuing general-purpose solutions.

The distinction won’t be about who has access to the biggest models. It will be about who built AI systems that understand their specific business and can support real operational decisions.

The expensive lesson many organizations are learning is that breadth doesn’t equal depth. For enterprise applications where accuracy and domain knowledge determine whether AI delivers value, smaller and smarter consistently outperforms bigger and broader.

