AMD shifts to modular GPU strategy with MI355X, ending MI300A-style APU designs

(Image credit: AMD)

Jump to:

You may also like

MI355X leads AMD's new MI350 Series with 288GB memory and full liquid-cooled performance
AMD drops APU integration, focusing on rack-scale GPU flexibility
FP6 and FP4 data types highlight MI355X’s inference-optimized design choices

AMD has unveiled its new MI350X and MI355X GPUs for AI workloads at its 2025 Advancing AI event, offering two options built on its latest CDNA 4 architecture.

While both share a common platform, the MI355X stands apart as the higher-performance, liquid-cooled variant designed for demanding, large-scale deployments.

The MI355X supports up to 128 GPUs per rack and delivers high throughput for both training and inference workloads. It features 288GB of HBM3E memory and 8TB/s memory bandwidth.

GPU-only design

AMD claims the MI355X delivers up to 4 times the AI compute and 35 times the inference performance of its previous generation, thanks to architectural improvements and a move to TSMC’s N3P process.

Inside, the chip includes eight compute dies with 256 active compute units and a total of 185 billion transistors, marking a 21% increase over the prior model. Each die connects through redesigned I/O tiles, reduced from four to two, to double internal bandwidth while lowering power consumption.

The MI355X is a GPU-only design, dropping the CPU-GPU APU approach used in the MI300A. AMD says this decision better supports modular deployment and rack-scale flexibility.

It connects to the host via a PCIe 5.0 x16 interface and communicates with peer GPUs using seven Infinity Fabric links, reaching over 1TB/s in GPU-to-GPU bandwidth.

Each HBM stack pairs with 32MB of Infinity Cache, and the architecture supports newer, lower-precision formats like FP4 and FP6.

The MI355X runs FP6 operations at FP4 rates, a feature AMD highlights as beneficial for inference-heavy workloads. It also offers 1.6 times the HBM3E memory capacity of Nvidia’s GB200 and B200, although memory bandwidth remains similar. AMD claims a 1.2x to 1.3x inference performance lead over Nvidia’s top products.

The GPU draws up to 1,400W in its liquid-cooled form, delivering higher performance density per rack. AMD says this improves TCO by allowing users to scale compute without expanding physical footprint.

The chip fits into standard OAM modules and is compatible with UBB platform servers, speeding up deployment.

“The world of AI isn’t slowing down - and neither are we, " said Vamsi Boppana, SVP, AI Group. "At AMD, we’re not just keeping pace, we’re setting the bar. Our customers are demanding real, deployable solutions that scale, and that’s exactly what we’re delivering with the AMD Instinct MI350 Series. With cutting-edge performance, massive memory bandwidth, and flexible, open infrastructure, we’re empowering innovators across industries to go faster, scale smarter, and build what’s next.”

AMD plans to launch its Instinct MI400 series in 2026.

GPU-only design

You may also like