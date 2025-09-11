Nvidia announces Rubin CPX GPU with 128GB memory built for enterprise AI workloads

Vera Rubin NVL144 CPX rack delivers 8 exaFLOPs compute and 100TB fast memory

Shipments planned for late 2026 with Rubin Ultra and Feynman already on roadmap

Nvidia has announced a brand new GPU built on the Rubin architecture and designed for long-context AI workloads.

Rubin CPX, as it’s known, includes 128GB of GDDR7 memory, making it the company’s first GPU at that capacity.

There were rumors of a 128GB RTX gaming card, but this is 100% not that. This GPU is a compute engine aimed at inference in areas such as software development, research, and high-definition video generation. It will not be running Metal Gear Solid Delta: Snake Eater any time soon.

Vera Rubin NVL144 CPX rack

The GPU delivers up to 30 petaFLOPs of NVFP4 compute and integrates hardware attention acceleration that Nvidia says is three times faster than the GB300 NVL72.

It also incorporates four NVENC and four NVDEC units to accelerate video workflows.

As part of Nvidia’s broader push toward disaggregated inference, Rubin CPX is designed to handle the compute-heavy context phase, while other Rubin GPUs and Vera CPUs address generation tasks.

By concentrating Rubin CPX on context processing tasks, Nvidia aims to improve throughput while lowering high-value inference deployment costs.

Nvidia’s Dynamo software will manage things behind the scenes, handing low-latency cache transfers and routing across components.

The company’s largest deployment model is the Vera Rubin NVL144 CPX rack. Each unit integrates 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs.

Together they deliver 8 exaFLOPs of NVFP4 compute, 100TB of high-speed memory, and 1.7PB/s of memory bandwidth.

Quantum-X800 InfiniBand or Spectrum-X Ethernet with ConnectX-9 SuperNICs provide the connectivity.

Shipments of Rubin CPX and the NVL144 CPX racks are currently penciled in for late 2026, following the recent tape-out at TSMC.

Nvidia’s roadmap includes Rubin Ultra, now expected in 2027, and Feynman, slated for 2028.

Those designs will extend the Rubin architecture with higher density modules, HBM4E memory, and faster networking.

