Making the case for GPU-free AI inference: 4 key considerations

A processor against a golden TechRadar background
(Image credit: Future)

GPUs are the engine behind many advanced computations, having become the defacto solution for AI model training. Yet, a fundamental misconception looms large: the belief that GPUs, with their parallel processing power, are indispensable for all AI tasks. This widespread presumption leads many to discount CPUs, which not only compete but often surpass GPUs especially for AI inference operations, which will comprise most of the market in production AI application. CPU-based inference is often the best choice, surpassing GPUs in four critical areas: price, power, performance, and pervasive availability.

As 85% of AI tasks focus not on model training but on AI inference, most AI applications don’t require the specialized computational horsepower of a GPU. Instead, they require the flexibility and efficiency of CPUs, which excel in multipurpose workload environments and deliver equivalent performance for low-latency tasks crucial for enhancing user interactions and real-time decision-making.

Jeff Wittich

Chief Product Officer at Ampere.

In this context, adopting CPUs over GPUs can be strategically advantageous for businesses seeking to optimize their operations for four key reasons:

1. Cost efficiency: Choose CPUs for cost savings in both acquisition and ongoing operations.

2. Energy conservation: Utilize CPUs for their lower power usage, benefiting both budgets and environmental sustainability.

3. Right-size performance: Deploy CPUs for their effectiveness in real-time, inference tasks.

4. Pervasive availability: Choose CPUs to implement diverse, tiered application stacks required for most AI enabled services while sidestepping supply limitations or specialized infrastructure inherent with GPUs.

Price advantages of CPUs in AI applications

CPUs often present a more economical option compared to GPUs, offering a balanced ratio of cost to performance, especially in AI inference tasks, where the specialization of GPUs is not required. Exploring the cost advantages of CPUs over GPUs highlights their value in several key areas:

  • Cost considerations: CPUs generally entail significantly lower upfront capital expenditure or rental fees compared to GPUs, which can be astronomically expensive, sometimes costing ten times more than an average CPU. This economic disparity is crucial for businesses looking to minimize investment costs for AI-enabled services.
  • Operational efficiency: CPUs also tend to be more energy-efficient than GPUs, contributing to lower operational costs. This efficiency not only helps in reducing energy bills but also enhances the overall sustainability of AI operations.
  • Flexibility and utility: The ability to repurpose CPUs for a variety of tasks adds to their cost-effectiveness. Unlike GPUs, which are specialized and thus limited in their application outside of high-intensity computations, CPUs are used across the entire application infrastructure found in any digital service, including those that run AI in production. This adaptability reduces the need for additional hardware investments, further minimizing overall technology expenditures and enhancing return on investment.

Power efficiency: The operational and environmental advantages of CPUs in AI

The lower power consumption of CPUs versus GPUs highlights significant operational and environmental advantages of CPUs, especially in AI inference tasks. While GPUs are essential for training due to their high precision calculations, CPUs are ideal for inference tasks which typically require less overall precision and computational power and integration with surrounding application tiers to function.

This efficiency not only aligns with environmental sustainability goals but also reduces operational costs. In data centers, where power and space are at a premium, the lower power requirements of CPUs offer a compelling advantage over GPUs, which can consume up to 700 watts each, surpassing the typical American household. This difference in power consumption is crucial as the industry seeks to manage increasing energy demands without expanding its carbon footprint. Consequently, CPUs emerge as a more sustainable choice for certain AI applications, providing an optimal balance of performance and energy efficiency.

Right-sizing AI inference performance with CPU technology

Unlike GPUs, which are built for massive parallel processing with large batch sizes, CPUs excel in supporting small batch size applications, such as enhancing AI inference performance in real-time applications characterized by consistently low latency operation. Here’s how CPUs contribute to performance in specific AI use cases:

  • Natural Language Processing: CPUs facilitate real-time interpretation and response generation, crucial for applications that require instantaneous communication, including many modern optimized GenAI models such as Llama3.
  • Real-Time Object Recognition: CPUs enable swift image analysis, essential for systems that need immediate object recognition capabilities such as video surveillance or industrial automation.
  • Speech Recognition: CPUs process voice activated customer interactions quickly, enhancing speech recognition use cases such as AI-powered restaurant drive-throughs or digital kiosks to reduce wait times and improve service efficiency.

In each scenario, the role of CPUs is integral to maximizing the responsiveness and reliability of the AI enabled system in a real-world use case.

CPU ubiquity enhances access to production-ready AI inference

Any AI-enabled service requires an entire stack of general-purpose applications that are the framework for feeding, processing, conditioning, and moving the data used by the AI model. These applications run everywhere on general-purpose CPUs. With most inference tasks running well on CPUs, they are easily integrated into existing compute installations. In cloud or on-premise infrastructure, the utility of processing the AI workloads along other computing tasks makes the AI enabled service that much more elastic and scalable without the need for specialized GPU systems.

In addition, the tech industry recently experienced significant GPU shortages due to soaring demand and limited production capacities. These shortages have led to extended wait times and inflated prices for businesses, hindering AI growth and innovation. The Wall Street Journal reports that the AI industry spent $50 billion last year on GPUs to train advanced models, yet generated only $3 billion in revenue. With AI inference accounting for as much as 85% of AI workloads, the disparity between spending and revenue could soon become unsustainable if businesses continue to rely on GPUs for these tasks.

Conversely, CPUs are ubiquitous and can be either purchased for on-premise use from server suppliers or accessed via public cloud from various service providers. Offering a balanced approach to performance and cost, CPUs present a more practical alternative for efficient data processing in AI inference tasks, making them a suitable choice for businesses looking to sustain operations without the financial burden of high-end GPUs.

We've featured the best processor.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

Jeff Wittich is the Chief Product Officer at Ampere.