This tiny chip could dethrone Nvidia’s GPUs by merging compute and memory for mind-blowing AI speed and efficiency

(Image credit: TechPowerUp)

GSI Gemini-I APU reduces constant data shuffling between the processor and memory systems
Completes retrieval tasks up to 80% faster than comparable CPUs
GSI Gemini-II APU will deliver ten times higher throughput

GSI Technology is promoting a new approach to artificial intelligence processing that places computation directly within memory.

A new study by Cornell University draws attention to this design, known as the associative processing unit (APU).

It aims to overcome long-standing performance and efficiency limits, suggesting it could challenge the dominance of the best GPUs currently used in AI tools and data centers.

A new contender in AI hardware

Published in the ACM journal and presented at the recent Micro ’25 conference, the Cornell research evaluated GSI’s Gemini-I APU against leading CPUs and GPUs, including Nvidia’s A6000, using retrieval-augmented generation (RAG) workloads.

The tests spanned datasets from 10 to 200GB, representing realistic AI inference conditions.

By performing computation within static RAM, the APU reduces the constant data shuffling between the processor and memory.

This is a key source of energy loss and latency in conventional GPU architectures.

The results showed the APU could achieve GPU-class throughput while consuming far less power.

GSI reported its APU used up to 98% less energy than a standard GPU and completed retrieval tasks up to 80% faster than comparable CPUs.

Such efficiency could make it appealing for edge devices such as drones, IoT systems, and robotics, as well as for defense and aerospace use, where energy and cooling limits are strict.

Despite these findings, it remains unclear whether compute-in-memory technology can scale to the same level of maturity and support enjoyed by the best GPU platforms.

GPUs currently benefit from well-developed software ecosystems that allow seamless integration with major AI tools.

For compute-in-memory devices, optimization and programming remain emerging areas that could slow broader adoption, especially in large data center operations.

GSI Technology says it is continuing to refine its hardware, with the Gemini-II generation expected to deliver ten times higher throughput and lower latency.

Another design, named Plato, is in development to further extend compute performance for embedded edge systems.

“Cornell’s independent validation confirms what we’ve long believed, compute-in-memory has the potential to disrupt the $100 billion AI inference market,” said Lee-Lean Shu, Chairman and Chief Executive Officer of GSI Technology.

“The APU delivers GPU-class performance at a fraction of the energy cost, thanks to its highly efficient memory-centric architecture. Our recently released second-generation APU silicon, Gemini-II, can deliver roughly 10x faster throughput and even lower latency for memory-intensive AI workloads.”

Via TechPowerUp

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.