Why most AI programs stall, and what it will take to scale them
Why enterprise AI stalls at pilot stage and why context, not models, unlocks scale
They say a lot can happen in a week in politics. So imagine how much can change in a year within the world of AI! Even 12 months ago, Gen AI still wasn’t mainstream worldwide. But by the end of 2025, roughly one in six people worldwide were using AI tools.
Last year proved a particular leap forward for enterprise AI, with countless firms proving they could build AI initiatives. Most have now built a portfolio of pilots, proof of concepts, and internal demos that look impressive.
President and CPO at Neo4j.
Yet as budgets tighten, boards and executives are questioning why so many systems aren’t being adopted for everyday business use. Many CEOs still struggle to point to clear revenue gains or cost reductions from AI investments, despite huge spending on the technology.
This is not because AI has stalled, though: models are stronger, cheaper, and easier to deploy than ever. The problem sits in the messy space between the experimentation and production phases – the area we call ‘AI pilot purgatory’.
The rise of ‘pilot purgatory’
Many organizations are now trapped in this phase of AI rollout, unable to escape the same, never-ending loop. Yes, small teams can easily spin up agents that work in a sandbox. But asked to scale across departments, integrate with live systems, or stand up to audit and risk scrutiny, they often fall apart.
Typically, the project simply slows down, then stalls, slowly coming to a halt. And it’s difficult to pinpoint why. Often, it’s because ownership is unclear, and confidence drains away until nobody wants to sponsor the move to production.
Reframing agents’ failings around context, not models
These projects fail because too much attention is paid to model choice and prompt design, and not enough to what we call ‘context’.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Think about human decision-making, for example. It depends on shared understanding: who is responsible for what, which workplace-based rules apply, how similar cases were handled before, and why a particular judgement was made.
Right now, most AI systems are deployed without access to that connective tissue, and agents are particularly exposed. As they take on more autonomy, most users expect them to behave less like tools and more like junior colleagues - colleagues that can justify decisions, cite policy, and adapt as rules change. If they can’t, then they quickly become a liability.
Logs and dashboards don’t solve this problem. They record what happened but strip it of meaning. A timestamped action tells you little about intent, after all. When regulators, auditors, or customers ask why a decision was made, they’re ultimately technical traces that serve as a poor substitute for a coherent account.
Why traceability beats explainability
Instead of treating context as a pile of documents to retrieve information from, some organizations are beginning to model it explicitly with context graphs: connected maps of decision history that make organizational judgements searchable, traceable, and reusable.
The idea is to link people, policies, systems, decisions, and outcomes into a connected structure that evolves over time.
Crucially, it captures decision traces: what happened, which policies were applied, what exceptions were made, and the reasoning behind the outcome, far beyond traditional tools and systems. In reality, many consequential decisions happen outside of these applications, such as in email or team messaging.
For example, a policy might state that discounts above a certain threshold require senior sign-off. Yet, the actual conversation that led to an exception, such as the emails explaining the strategic rationale, the team messaging thread where colleagues validated the reasoning, and the informal approvals that followed, may not be logged on a specific application.
While formal systems encode rules and policies, they rarely capture the human reasoning, negotiation, and judgement that underpin real organizational decision making. Decision traces fill that gap, surfacing not just what happened, but why. Seen this way, explainability stops being an abstract promise and becomes traceable.
You can see which rules applied, which data was considered, which system or person approved the action, and how similar cases were resolved - whether that resolution happened in a board meeting or an internal thread. And that makes governance operational rather than theoretical, because controls and accountability sit inside the same structure as the decisions themselves.
It’s an immensely valuable approach that addresses organizational amnesia, like when teams change, policies shift, and systems are replaced. Without a shared memory, each new project would have to start from scratch, relearning the same lessons and potentially repeating the same mistakes.
But a connected context layer allows learning to accumulate. Agents can inherit institutional knowledge rather than improvising each time, making it far easier to spot patterns that humans miss, exceptions that point to a policy gap, or outlier decisions that all trace back to the same broken step in a process.
What scaling AI actually demands
Scaling AI is less of a technical upgrade than an organizational one. It forces uncomfortable questions about data quality, ownership, permissions, and accountability, as well as clarity on who sets policy, who can override it, and how exceptions are handled.
Put simply, these are not problems a better model can solve. To move past pilot purgatory, organizations tend to start small, within specific decision domains, rather than in grand AI programs.
They map the rules, actors, and outcomes involved, then let agents operate within that bounded context. As trust grows, the scope expands. Over time, what emerges is not just an AI system, but a living map of how the organization works.
The next phase of enterprise AI will not be won by those chasing gains in model performance. It will be shaped by those who invest in building and maintaining context graphs that preserve institutional memory.
We've featured the best AI tool.
This article was produced as part of TechRadar Pro Perspectives, our channel to feature the best and brightest minds in the technology industry today.
The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/pro/perspectives-how-to-submit
President and CPO at Neo4j.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.