'The CPU is the system’s executive layer': Intel joins SambaNova as both face existential threat from Nvidia’s Groq-powered inference

Intel graphics cards

  • GPUs handle prefill operations by converting prompts into key-value caches
  • SambaNova RDUs generate tokens at high throughput and low latency
  • Intel Xeon 6 processors manage workload distribution and execute compiled code

Intel and SambaNova Systems have introduced a joint hardware blueprint combining GPUs, SambaNova RDUs, and Intel Xeon 6 processors for large-scale inference workloads.

The system assigns GPUs to prefill operations, RDUs to decoding, and Xeon CPUs to execution and orchestration tasks across agent-driven environments.

“Agentic AI is moving into production — and the winning pattern we’re seeing is GPUs to start the job, Intel Xeon 6 to run it, and SambaNova RDUs to finish it fast,” said Rodrigo Liang, CEO and co-founder of SambaNova Systems.

Article continues below

CPU is the execution and control layer

This design is scheduled to be available in the second half of 2026 for enterprises, cloud providers, and sovereign deployments.

The architecture places Intel Xeon 6 processors at the center of system control, where they manage workload distribution, execute code, and coordinate tool interactions.

It includes handling compilation, validating outputs, and maintaining communication between simultaneous processes.

“When thousands of simultaneous coding agents are generating tool calls, retrieval requests, code builds, and encrypted inter-agent messages, the CPU is not a background component — it is the system’s executive and action layer,” said Harry Ault, CRO of SambaNova.

The statement defines the CPU as the primary layer responsible for system behavior rather than a supporting component.

According to SambaNova, Xeon 6 delivers more than 50% faster LLVM compilation times compared with Arm-based server CPUs.

It also delivers up to 70% faster vector database performance compared with other x86-based systems.

These figures relate to execution speed within coding and retrieval workflows, and in this configuration, GPUs process the prefill stage by converting prompts into key-value caches.

SambaNova RDUs operate as the decoding layer, generating tokens at high throughput and low latency.

Xeon 6 processors function as both host CPUs and execution engines, managing system-level operations and running compiled workloads.

“Production inference is moving toward heterogeneous hardware — no single chip type is optimal for every stage of an agentic workflow,” said Banghua Zhu, co-founder and CTO at RadixArk.

He added that combining RDUs with Xeon CPUs allows systems to maintain compatibility with existing software environments.

The system is designed to run inside existing air-cooled data centers without requiring new builds.

According to the companies, this allows scaling of inference workloads without additional strain on water and energy resources.

As Nvidia and Groq continue to focus on improving inference throughput and latency, this announcement adds a layer of competition.

It offers an alternative approach that distributes workloads across multiple hardware layers rather than relying on a single processing model.


Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

TOPICS
Efosa Udinmwen
Freelance Journalist

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.