Cooling just became the most strategic choice in AI infrastructure
AI cooling becomes critical to data center performance and scalability
For most of the last forty years, data center performance gains came from one place: smaller transistors. Moore's Law and Dennard scaling did the work.
Each new generation of silicon delivered more performance at the same or lower power, and thermal was a maintenance problem, not a performance limiter.
Cooling sat in the background. Operators measured it through PUE, optimized for it where convenient, and otherwise treated it as overhead.
That world is over.
Co-Founder and CEO of Ferveret.
Dennard scaling broke years ago, transistor efficiency gains are leveling off, and AI accelerator TDPs have climbed from 700 watts in the H100 generation to over 1,400 watts in current Blackwell deployments, with NVIDIA's upcoming Rubin platform expected to push further.
Thermal is no longer something that happens after the architectural decisions. It is now the binding constraint on how much performance a chip can sustain, and it is becoming one of the most strategic choices an AI data center operator can make.
Why this matters now
The macro numbers explain why this matters now. Data centers already consume up to 4.5 percent of total U.S. electricity production, a figure projected to reach 12 percent by 2028. McKinsey estimates global data center spending could approach $7 trillion by 2030, and that data center power demand will reach 220 gigawatts in the same window.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
None of that capacity arrives quickly. New transmission lines and substations now take five to ten years to permit and build, which means operators cannot simply order more power when they need to scale.
The result is a hard pressure to extract maximum performance from the power they already have under contract. That pressure is what is reshaping how the industry thinks about cooling.
Cooling is no longer just an afterthought
For years, cooling was measured as an efficiency loss, captured through metrics like Power Usage Effectiveness (PUE) that quantified how much energy was burned on overhead before reaching the IT load. Today, the more meaningful metric is how much useful compute you extract per unit of power. NVIDIA's Jensen Huang now describes this as "performance per watt" or "tokens per watt" for AI workloads, and cooling plays a direct role in both halves of that equation.
Direct-to-chip liquid cooling has become the new baseline because it removes heat far more effectively than air. But even direct-to-chip is being pushed to its limit by 1,000+ watt accelerators, and most current deployments still require facility water around 30 degrees Celsius to stay within ASHRAE W2 and W3 envelopes, which means chillers running for much of the year in warm climates.
Better thermal management has effects on both sides of the tokens-per-watt equation. It reduces facility overhead, so more of the contracted power reaches the rack. And it allows chips to operate closer to their full thermal headroom, sustaining higher performance for longer.
Those gains compound. Recent UCLA study has shown that combining a 17 percent improvement in facility efficiency with a 15 percent gain in server-level performance per watt from better thermal management translates to roughly 35 percent more tokens per watt within the same power envelope. In a 10 megawatt facility, that is more than a megawatt of additional usable compute, with no additional grid commitment.
At GTC 2026, NVIDIA CEO Jensen Huang made this argument explicitly. He told the audience that beyond the silicon roadmap, infrastructure-level optimization across power and cooling represents another factor of two in performance still on the table. "There's no question in my mind there's a factor of two in here, and a factor of two at the scale we're talking about is gigantic," he said.
That gain does not come from a smaller transistor. It comes from rethinking how power and thermal energy move through the rack. Recent UCLA study suggests that at least one third of that infrastructure-level gain is attributable specifically to cooling. Cooling is no longer a support function. It is a primary lever for performance.
Water is becoming a hard constraint
Power is not the only pressure point. Water is emerging as an equally critical and often more immediate constraint on data center expansion. Traditional cooling architectures often rely on evaporative processes that consume vast amounts of water. According to the Environmental and Energy Study Institute, large data centers may use up to 5 million gallons per day, comparable to the daily water use of a town of 10,000 to 50,000 people.
This is drawing notice from regulators and communities in already water-stressed areas. The result is longer permitting cycles, higher project risk, and in some cases new developments paused entirely. States and municipalities are also implementing stricter reporting requirements and adjusting electricity rate structures specifically for data centers.
Operators now have to factor water alongside power into site selection. Facilities that minimize energy waste and reduce or eliminate water consumption are better positioned to navigate this environment.
The shift toward next-generation cooling
In response, the industry is entering a new phase of cooling innovation. Air cooling is no longer sufficient for high-density AI workloads. Liquid cooling has become the baseline, but within liquid cooling, not all approaches deliver the same efficiency or scalability.
The next wave of innovation focuses on improving heat transfer at the source: removing thermal energy more effectively at the chip level while reducing system-wide overhead. Some of these approaches draw on heat transfer techniques refined in other high-density power industries such as nuclear power generation, where the challenge of moving large amounts of thermal energy from a constrained physical space has been studied for decades.
The goal is straightforward. Better cooling enables higher rack densities, allows operation at higher facility water temperatures, and reduces or eliminates reliance on water-intensive heat rejection. Just as importantly, the next generation of cooling architectures is being designed to integrate with existing data center footprints, so operators can evolve their infrastructure rather than rebuild it from scratch.
NVIDIA's Vera Rubin platform, announced at CES 2026, was a clear signal of where this is heading. Vera Rubin is designed for 45 degree Celsius supply water, which means dry coolers can do most of the heat rejection year-round and mechanical chillers become optional in most climates. That is a fundamental shift in how cooling infrastructure will be designed for the next decade.
A defining moment for data center design
The data center industry is at an inflection point. AI compute demand is accelerating, and every resource needed to support it, power, water, physical space, is becoming harder to secure. Cooling sits at the intersection of all three.
It determines how efficiently power is used, how much water is consumed, and ultimately, where infrastructure can be deployed. The operators that recognize this now will have a sustained advantage. How to keep data centers cool under AI workload pressure has become one of the most strategic decisions in modern infrastructure.
We feature the best web hosting services: tested and reviewed.
This article was produced as part of TechRadar Pro Perspectives, our channel to feature the best and brightest minds in the technology industry today.
The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/pro/perspectives-how-to-submit
Co-Founder and CEO of Ferveret.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.