Companies are hoarding expensive AI GPUs and leaving most of that costly compute power sitting idle while bills quietly spiral upward

Nvidia GPUs — (Image credit: Rude Baguette)

Most AI GPUs run at shockingly low utilization across production systems
Companies are paying for twenty times more GPU capacity than needed
Overprovisioning is rising sharply instead of improving year after year

Companies across the tech industry are racing to buy massive amounts of AI infrastructure, but most of it does barely any useful work at all.

A report from Cast AI, based on tens of thousands of Kubernetes clusters across AWS, Azure, and GCP, found that average GPU utilization sits at just 5%.

Many teams deploy sophisticated AI tools to manage their applications, yet those same tools are not used to optimize the underlying infrastructure.

The numbers are getting worse, not better

Organizations pay for roughly 20x more GPU capacity than their workloads actually use at any given moment.

The numbers come from direct measurements of production clusters and millions of compute resources before any optimization was applied.

"This is the third year we've published this report. The numbers are worse," said Laurent Gil, co-founder and President of Cast AI. "CPU utilization fell to 8%, down from 10%. Memory dropped from 23% to 20%."

The report also measured something called overprovisioning, which is the gap between what workloads actually need and what teams allocate to them.

CPU overprovisioning rose from 40% to 69% year over year, while memory overprovisioning now stands at 79%.

This means organizations reserve nearly twice as many CPU resources and four times as much memory as their workloads actually consume.

In short, organizations pay for infrastructure that their workloads do not even request, and the trend is accelerating instead of improving.

The situation gets even more expensive when comparing CPU and GPU costs directly. A CPU core sitting idle costs only cents per hour, but a GPU sitting idle costs dollars per hour.

For the first time since EC2 launched in 2006, GPU prices are rising instead of falling.

In January 2026, AWS raised H200 Capacity Block prices by 15%, citing supply and demand, which broke a two-decade precedent.

"At 5% utilization, the math doesn't work," the report states. The hoarding instinct makes sense because lead times are long, yet that same hoarding feeds the scarcity loop that drives prices even higher.

Not every cluster performs this badly, and one organization hit 49% utilization on H200s and 30% on H100s, well above the 5% average.

The difference comes down to automation rather than luck or better hardware. The tools to fix this already exist, including automated rightsizing, GPU sharing or time slicing, and Spot management.

However, most teams never get there because overprovisioning feels safer than running out of capacity, but that safety comes at a steep price.

The teams that closed the gap stopped treating resource efficiency as a manual, one-time task and started treating it as an automated, continuous process.

But Cast AI data reveals that most companies seem willing to keep paying large fees rather than change their habits.

Google logo on a black background next to text reading 'Click to follow TechRadar'

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking.

'5% utilization is a math fail': Millions of GPUs worth billions are mostly sitting idle, report finds

The numbers are getting worse, not better

Useful links