Data variety: the silent killer of AI — and how to conquer It

A business woman looking at AI on a transparent screen

(Image credit: Shutterstock)

The relationship between data and AI is inherently symbiotic: better data enables better AI, and better AI allows for more sophisticated data processing. This virtuous cycle should accelerate enterprise AI adoption, yet most organizations find themselves stuck before it even begins.

The culprit isn't computational power or model sophistication — it's data variety. While enterprises rush to deploy large language models and agentic systems, they're discovering that the messy, inconsistent, and wildly diverse nature of their data creates an insurmountable bottleneck.

The statistics tell a sobering story. While 94% of data and AI leaders say interest in AI is leading to a greater focus on data, 75% of surveyed leaders find AI adoption challenging, with 69% saying most AI projects don't make it into live operational use. Of companies that reported cost reductions from AI, most had savings of less than 10 percent, while those with revenue increases mostly reported gains of less than 5 percent.

Yes, we've all heard about data volume and velocity. But it's not the size or speed that trips up AI projects — it's the fact that data is messy, diverse, and wildly inconsistent across systems, formats, and structures within organizations and among external partners. With data volumes expected to increase more than tenfold from 2020 to 2030, this challenge is rapidly intensifying.

Saket Saurabh

CEO and co-founder of Nexla.

What Makes Data Variety So Challenging?

Enterprise data variety shows up across multiple, compounding layers that create exponential complexity. Every SaaS application, database, file system, and partner platform speaks a different language, requiring dozens or sometimes hundreds of unique connectors just to establish basic connectivity.

Each connector has to handle data arriving in countless forms: structured formats like CSV and JSON, semi-structured content like XML and spreadsheets, and fully unstructured materials including PDFs, contracts, images, and emails. Each requires context-sensitive parsing to extract usable information.

Even when dealing with the same business concepts, different systems use entirely different definitions and schemas. "Customer ID" in your CRM may bear no resemblance to "Account Number" in your billing software. Meanwhile, APIs evolve, vendors update fields, and data formats change mid-stream, making integration a constant maintenance challenge rather than a one-time effort.

External data compounds this complexity exponentially. While internal systems can be well controlled, external data sources from partners, suppliers, regulators, and customers introduce constant variability. New data providers mean new schemas, and existing ones may change unexpectedly without warning.

Why AI Alone Can't Solve the Problem

It's tempting to believe that AI, especially large language models, can simply be pointed to a data system, allowing AI-powered code generation to ingest raw data and figure it all out. In reality, there are multiple layers of technical challenges to solve when building truly enterprise-grade, reliable, and scalable integrations. Moreover, testing and maintaining integrations in light of the fact that many systems aren’t even well documented, makes this a problem that is incredibly hard for both humans and AI.

The combined human and AI effort, however, is very promising. It starts with taking advantage of the fact that AI excels at pattern recognition, suggesting schema mappings, and parsing unstructured content. But the foundational work of orchestration, reliable connectors, business logic implementation, and governance requires engineering discipline that pure AI cannot deliver alone.

Finally comes the people and process factor. Data and AI leaders consistently agree that cultural and change management challenges are the primary barrier to becoming data- and AI-driven, suggesting that technology alone is insufficient for success.

The Emerging Solution: Agentic Integration Architecture

The path forward isn't pure AI or pure software engineering — it's their thoughtful combination. We need AI-powered software abstractions that allow systems to adapt to variety rather than fight it.

At each layer of the data stack, AI assists while software engineering principles enforce durability, reliability, and governance:

Virtual data products represent a particularly powerful abstraction in this hybrid approach. By creating consistent, reusable interfaces that act as contracts between data producers and consumers, organizations can decouple physical data location and format from actual usage. This abstraction layer enables seamless collaboration while supporting diverse data formats without complex coding or integration barriers.

Modern platforms now support multi-speed data processing, allowing data pipelines to be defined once but operate across different processing engines and latencies. This flexibility ensures that real-time, batch, and streaming workloads can coexist within the same architectural framework.

Perhaps most importantly, successful implementations maintain human-in-the-loop collaboration where AI assists but humans validate critical decisions around schema inference, semantic mapping, and business logic. For example, newer standards like MCP and A2A are making it possible for AI to discover and recommend integrations or flows.

Data products that support MCP enable AI to discover the right data and actions, and then make recommendations for end-to-end integration. But engineers are still needed to establish governance, security, and guardrails against errors in AI-based planning to ensure that business needs are met.

Maintenance and assurance of quality as new model versions come is another key guardrail that engineers will build. This approach keeps integrations reliable while dramatically improving speed and scalability.

The Strategic Payoff

Enterprises that solve data variety challenges don't just reduce integration headaches — they unlock genuine competitive advantages. AI project cycles shrink from months to weeks when teams spend less time preparing data and more time using it.

Integration costs and times drop dramatically when reusable data products eliminate redundant connector development. When combined with the latest standards, they enable AI to help deliver more of the integration work.

Most significantly, model performance improves substantially thanks to higher-quality inputs, while teams can focus on innovation rather than data plumbing.

As PwC notes in their 2025 AI predictions, "A shrewd strategy will instead emphasize what can set you apart — how you leverage AI with your institutional knowledge and proprietary data". The companies that engineer for data variety early, using thoughtful combinations of AI, software engineering, and domain expertise, will find themselves with sustainable competitive moats.

The New Competitive Reality

As AI models become increasingly commoditized and accessible, the real differentiator won't be better models — it'll be better data systems. With more than 80 percent of organizations not yet seeing a tangible enterprise-level impact from generative AI, and most companies not even using half of their data, those that solve the data variety challenge will pull ahead decisively.

Ultimately, AI-ready data isn't about having more data — it's about having the right data, in the right shape, at the right time. The AI race won't be won in model labs. It will be won in the trenches of data integration, where variety is tamed via a rich collaboration between intelligent engineering and AI rather than magically solved by AI.

B=We list the best data recovery service and the best data recovery software.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

CEO and co-founder of Nexla.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.