HBM-on-GPU set to power the next revolution in AI accelerators - and just to confirm, there's no way this will come to your video card anytime soon

Imec 3D HBM-on-GPU technology
(Image credit: Imec)

  • 3D HBM-on-GPU design reaches record compute density for demanding AI workloads
  • Peak GPU temperatures exceeded 140°C without thermal mitigation strategies
  • Halving the GPU clock rate reduced temperatures but slowed AI training by 28%

Imec presented an examination of a 3D HBM-on-GPU design aimed at increasing compute density for demanding AI workloads at the 2025 IEEE International Electron Devices Meeting (IEDM).

The thermal system-technology co-optimization approach places four high-bandwidth memory stacks directly above a GPU through microbump connections.

Each stack consists of twelve hybrid-bonded DRAM dies, and cooling is applied on top of the HBMs.

Thermal mitigation attempts and performance trade-offs

The solution applies power maps derived from industry-relevant workloads to test how the configuration responds under realistic AI training conditions.

This 3D arrangement promises a leap in compute density and memory per GPU.

It also offers higher GPU memory bandwidth compared to 2.5D integration, where HBM stacks sit around the GPU on a silicon interposer.

However, the thermal simulations reveal severe challenges for the 3D HBM-on-GPU design.

Without mitigation, peak GPU temperatures reached 141.7°C, far above operational limits, while the 2.5D baseline peaked at 69.1°C under the same cooling conditions.

Imec explored technology-level strategies such as merging HBM stacks and thermal silicon optimization.

System-level strategies included double-sided cooling and GPU frequency scaling.

Reducing the GPU clock rate by 50% lowered peak temperatures to below 100°C, but this change slowed AI training workloads.

Despite these limitations, Imec argues that the 3D structure can deliver higher compute density and performance than the 2.5D reference design.

"Halving the GPU core frequency brought the peak temperature from 120°C to below 100°C, achieving a key target for the memory operation. Although this step comes with a 28% workload penalty..." said James Myers, System Technology Program Director at Imec.

"...the overall package outperforms the 2.5D baseline thanks to a higher throughput density offered by the 3D configuration. We are currently using this approach to study other GPU and HBM configurations..."

The organization suggests this approach could support thermally resilient hardware for AI tools in dense data centers.

Imec presents this work as part of a broader effort to link technology decisions with system behavior.

This includes the cross-technology co-optimization (XTCO) program, launched in 2025, which combines STCO and DTCO mindsets to align technology roadmaps with system scaling challenges.

Imec said that XTCO enables collaborative problem-solving for critical bottlenecks across the semiconductor ecosystem, including fabless and system companies.

However, such technologies will likely remain confined to specialized facilities with controlled power and thermal budgets.

Via TechPowerUp


Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Efosa Udinmwen
Freelance Journalist

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.