Why messy data will make your company’s AI bill much higher than expected

A robot standing thoughtfully in front of a giant digital display with code on it — (Image credit: Getty Images)

For all the talk about AI infrastructure, chips, and the staggering amount of electricity now required to support large-scale model training and inference, there is still a quieter part of the story that rarely gets the same level of attention inside enterprises, and that is the state of the data those systems are actually running on.

The International Energy Agency projects that electricity generation to supply data centers will grow from 460 TWh in 2024 to more than 1,000 TWh in 2030 and 1,300 TWh by 2035 in its base case, underscoring how quickly the energy demands around AI are rising.

Paul Wnek

Founder and CEO of Coalescence Cloud.

In the United States, the pressure is already visible. The U.S. Department of Energy says data centers consumed about 4.4% of total U.S. electricity in 2023 and are expected to consume approximately 6.7% to 12% by 2028.

Article continues below

Those numbers are important, but they can also make the problem feel distant, almost as if AI sustainability is something that happens only at the hyperscaler level.

In reality, a meaningful part of the cost and waste associated with AI starts much closer to home, inside the CRM, the PSA platform, the finance system, the spreadsheet someone still keeps on the side because they do not trust the main dashboard, and the duplicate records no one has made time to clean up.

Digital load

Far fewer companies are asking a more immediate question about their own environments, which is how much unnecessary digital load they are creating simply because their data is messy.

AI does not arrive inside an enterprise and begin operating on some idealized set of perfectly structured information. It inherits whatever is already there.

If the customer record exists in five places, if revenue is defined slightly differently by sales and finance, if project data is incomplete, if teams are still relying on manual workarounds because systems do not reconcile, then AI will operate inside that reality.

The technology system will not correct those weaknesses on its own. More often, it will make them more visible and more expensive, because every unnecessary workflow, every redundant query, every round of human rechecking, and every extra cycle spent trying to validate an output consumes more storage, more processing, and more employee time.

Healthy data is not just data that happens to be clean on a given day. It is data that is understood, governed, maintained, and aligned across the business in a way that allows people to trust it.

IBM’s recent work on poor data quality makes the business side of this clear: 43% of chief operations officers cite data quality as their top issue, and more than a quarter of organizations estimate they lose over $5 million annually due to poor data quality.

Enterprise technology leaders need to be cognizant now more than ever to what happens when those same data issues are layered into AI environments that are already computationally intensive.

Poor data quality has always been expensive. What changes with AI is the speed and scale at which that expense compounds.

A broken process that used to frustrate a team now has the potential to create repeated load across multiple systems and models, while also eroding trust in the outputs that were supposed to make work easier.

Sustainability

The sustainability conversation around AI has to become more operational. It cannot live only at the level of energy procurement, carbon goals, or infrastructure investment. Those issues matter, but so does the everyday reality of what enterprises are asking their systems to do.

If a company is running AI tools on top of fragmented records, disconnected workflows, and low-confidence reporting, then a portion of the environmental burden tied to that AI is self-inflicted. The organization is spending more compute to get to answers that should have been easier to reach in the first place.

There is also a governance issue sitting underneath all of this, because a surprising number of organizations are still moving faster on deployment than they are on ownership and accountability.

Data reliability continues to show up as one of the biggest barriers to useful AI adoption, and that makes sense.

If no one can clearly explain where key data comes from, who owns it, how it is maintained, or why definitions differ across systems, then the company has already created the conditions for unnecessary waste before the model ever enters production.

In that environment, AI becomes another layer of complexity laid over a foundation that was already unstable.

The organizations that get better results tend to be the ones that take the more disciplined path, which often looks less exciting from the outside.

They reduce duplication. They align system logic. They decide what metrics actually mean and make sure those definitions hold across teams. They simplify workflows before they automate them. They fix ownership before they expand access.

That kind of work rarely gets framed as AI strategy, but in practice it is often what separates AI programs that become useful from the ones that quietly create more overhead than value.

Healthier underlying systems

Once the underlying systems are healthier, AI starts to do what leaders hoped it would do in the first place. Forecasts become more reliable because the inputs are stable. Customer data becomes more actionable because teams are not arguing over whether it is current.

Automation begins to remove work instead of generating extra review cycles. At that point, efficiency improves in a way that matters both economically and operationally, and by extension, from a sustainability perspective too, because the organization is no longer burning resources to compensate for preventable disorder.

For companies trying to make sense of AI’s growing cost, that is the right place to start. Before asking how to power more models, it is worth asking how much unnecessary digital load is already being created by unhealthy data.

Before treating sustainability as something outside the enterprise stack, it is worth recognizing that cleaner systems use resources more intelligently.

And before assuming that AI’s environmental impact is only an infrastructure problem, leaders should look closely at the condition of the data their own business is feeding into it every day.

Cleaner data will not solve every challenge associated with AI infrastructure and energy constraints, but it does make enterprise systems more efficient, more trustworthy, and more sustainable in ways that are immediate and measurable.

That is a much better place to begin than simply assuming more compute will solve what better operational discipline could have prevented.

We've featured the best AI chatbot for business.

This article was produced as part of TechRadar Pro Perspectives, our channel to feature the best and brightest minds in the technology industry today.

The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/pro/perspectives-how-to-submit

TOPICS