Inference pushes AI out of the data center

A long corridor with a sleek black floor, glowing green lights in the ceiling and rows of LEDS on either wall — (Image credit: Getty Images)

In the early 2000s, the architects of the internet faced a familiar-sounding modern problem: How do you build a system that handles massive, unpredictable demand without it breaking when any single part of it fails?

Their answer was to build a system of peer-to-peer networking. Rather than routing everything through central servers, P2P systems distributed load across thousands of individual nodes with no single point of failure, intelligence closer to the user, and resilience baked into the architecture rather than bolted on top.

Article continues below

Neel Khokhani

Founder of investment fund Epochal Corporation.

Then, as the cloud computing era took hold, the hyperscale model became the dominant infrastructure logic of the last fifteen years. Its premise — aggregate everything into the largest possible data centers, optimize for unit cost, centralize without limit — made sense for many workloads.

But AI inference, the phase of AI that is now exploding in enterprise environments, operates on exactly the same principles that made P2P compelling in the first place.

Understanding why

Understanding why requires separating two phases of AI that are often conflated. Training a large model is a one-time, compute-intensive process. It runs well on centralized, aggregated infrastructure, and the hyperscale logic holds there. Inference is different.

Inference is every time the model is actually used: a fraud detection system flagging a transaction, a predictive maintenance system identifying a fault on the factory floor, a logistics platform recalculating routes in real time. These decisions happen continuously, in milliseconds, at the point where operations actually run.

Routing inference workloads to a distant hyperscale facility introduces latency that is simply incompatible with many of these use cases. A surgical assistance system cannot wait for a round trip to a data center in another region. Neither can an industrial safety system, an autonomous inspection drone, or a real-time customer service agent running on retail floor infrastructure.

McKinsey projects that global data center demand will more than triple by 2030, driven overwhelmingly by inference rather than training, and the infrastructure serving that demand needs to be built around what inference actually requires, which is compute close to where the decision happens.

P2P systems’ answer was to stop treating distribution as a problem and start treating it as the architecture. BitTorrent did not try to solve file transfer by building faster central servers, but it distributed the problem across thousands of nodes: each one close to a user, each one handling local demand locally.

When individual nodes dropped off, the system degraded at the margin. No central server going down took the whole network with it. The architecture assumed failure and built around it, outperforming centralized alternatives on speed, resilience, and scale simultaneously.

Edge computing

Edge computing applies the same logic to AI infrastructure. Smaller, modular compute facilities positioned close to where data is generated and consumed distribute the inference workload the way P2P distributed file transfer. Each site handles local decisions locally. The network as a whole becomes more resilient because no single facility carries the entire load.

Running that inference centrally also carries a cost that compounds with scale: Every time data moves out of a hyperscale cloud provider's network, organizations pay egress fees.

For AI workloads that require continuous data transfer between a central facility and distributed operational environments, those charges accumulate in ways that are easy to underestimate at the planning stage. Processing data locally at the edge — close to where it is generated — reduces the volume crossing the network in the first place.

A hardware shift is also changing the feasibility calculation at the device level. Neural processing units (NPUs) designed specifically for AI inference tasks are now embedded in smartphones, laptops, and industrial edge devices.

The compute required to run capable inference workloads has been falling steadily, and hardware that would have required a server rack a few years ago now fits in a handheld device.

As inference-capable hardware becomes cheaper and more physically compact, the assumption that every workload needs to route back to a centralized facility becomes harder to sustain.

Data sovereignty

As data sovereignty regulation is tightening across the EU, Southeast Asia, Latin America, and beyond, centralizing inference in a small number of facilities creates legal exposure.

For organizations operating across multiple jurisdictions, edge infrastructure resolves this by design: data is processed locally, within the relevant jurisdiction, without requiring complex legal and technical workarounds after the fact.

Finally, another important element is that power availability — not price — is becoming the binding constraint on data center capacity. In Northern Virginia, the world's densest cloud hub, utilities have projected connection timelines for large projects stretching up to seven years due to grid congestion.

Ireland's data centers now consume more than 20% of national electricity. These problems are the predictable result of concentrating enormous compute into a small number of locations, but the megawatt problem is more tractable when it does not need solving in one place.

Edge deployments, by distributing workloads across many smaller sites, spread the energy demand in a way that aligns better with available grid capacity.

None of this means hyperscale infrastructure is going away. Training workloads, large-scale data processing, and many enterprise applications will continue to run efficiently in centralized cloud environments.

The case for edge is not a case against cloud, but rather for matching infrastructure architecture to what workloads actually need.

The engineers who built P2P networks understood that distributing intelligence across the network made it stronger, not weaker.

As inference pushes AI out of the data center and into the places where businesses actually operate, that lesson is becoming increasingly relevant again.

We've rated and reviewed the best backup software.

This article was produced as part of TechRadar Pro Perspectives, our channel to feature the best and brightest minds in the technology industry today.

The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/pro/perspectives-how-to-submit

TOPICS

Founder of investment fund Epochal Corporation.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

Understanding why

Edge computing

Data sovereignty

Useful links