Microsoft backed a tiny hardware startup that just launched its first AI processor that does inference without GPU or expensive HBM memory and a key Nvidia partner is collaborating with it

(Image credit: Future / John Loeffler)

Microsoft-backed startup introduces GPU-free alternatives for generative AI
DIMC architecture delivers an ultra-high memory bandwidth of 150 TB/s
Corsair supports transformers, agentic AI, and interactive video generation

d-Matrix Inc., a hardware startup based in Santa Clara, California, has introduced its first AI processor, Corsair, which is aimed at enhancing AI inference.

Backed by Microsoft and leveraging cutting-edge technology, Corsair eschews traditional GPUs and expensive high-bandwidth memory (HBM), delivering significant performance and cost benefits.

Corsair is currently available to early-access customers, with broader availability planned for the second quarter of 2025.

Corsair’s performance redefines AI inference

The Corsair processor is purpose-built to handle demanding AI inference tasks, particularly for generative AI models. For example, it achieves 60,000 tokens per second at 1 ms per token when running Llama3 8B in a single server.

In more resource-intensive scenarios, such as with Llama3 70B models, Corsair delivers 30,000 tokens per second at 2 ms per token in a single rack, translating into substantial savings in energy and operational costs compared to traditional GPU-based solutions.

The processor is built on Nighthawk and Jayhawk II tiles, using a 6nm manufacturing process. Each Nighthawk tile integrates four neural cores and a RISC-V CPU, tailored to support large-model inference with digital in-memory computation (DIMC) and versatile datatype processing, including block floating point (BFP).

Corsair adopts chiplet packaging, integrating memory and computation to maximize efficiency. It conforms to the industry-standard PCIe Gen5 full height full-length card form factor and can be paired with DMX Bridge cards for scalable performance. Each card is powered by 2400 TFLOPs of 8-bit peak computing, along with 2GB of integrated performance memory and up to 256GB of off-chip memory capacity.

It is important to note that Micron Technology, a key partner of Nvidia, is also collaborating with d-Matrix.

Initially set to launch in late 2023, d-Matrix reconfigured its architecture in response to the surging demand for generative AI. This pivot allowed Corsair to incorporate enhancements tailored for transformer models and emerging applications like agentic AI and interactive video generation.

“We saw transformers and generative AI coming, and founded d-Matrix to address inference challenges around the largest computing opportunity of our time,” said Sid Sheth, cofounder and CEO of d-Matrix.

“The first-of-its-kind Corsair compute platform brings blazing fast token generation for high interactivity applications with multiple users, making Gen AI commercially viable,” Sheth added.

Via eeNews

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking. Efosa developed a keen interest in technology policy, specifically exploring the intersection of privacy, security, and politics. His research delves into how technological advancements influence regulatory frameworks and societal norms, particularly concerning data protection and cybersecurity. Upon joining TechRadar Pro, in addition to privacy and technology policy, he is also focused on B2B security products. Efosa can be contacted at this email: udinmwenefosa@gmail.com

Corsair’s performance redefines AI inference

You may also like