Huawei Ascend 950 vs Nvidia H200 vs AMD MI300 Instinct: How do they compare?
AI compute demand is driving unprecedented innovation

- Huawei Ascend 950DT FP8 formats target efficient inference without accuracy loss
- Nvidia H200 leans on a mature software ecosystem and Hopper GPU strengths
- AMD Instinct MI300’s FP64 parity appeals to serious scientific computation workloads
In recent years, the demand for AI training and inference computing has pushed chip makers to innovate aggressively - efficiency in memory bandwidth, data formats, interconnects, and total compute output are now as critical as raw FLOPS.
Each company targets demanding scenarios such as generative AI training and high-performance computing, where AI tools increasingly depend on fast accelerators to process massive datasets.
Multiple brands approach the challenge with different compute platform characteristics - so we've tried to help understand these differences and clarify how the Ascend 950 series, H200, and MI300 Instinct compare.
Category | Huawei Ascend 950DT | NVIDIA H200 | AMD Radeon Instinct MI300 |
---|---|---|---|
Chip Family / Name | Ascend 950 series | H200 (GH100, Hopper) | Radeon Instinct MI300 (Aqua Vanjaram) |
Architecture | Proprietary Huawei AI accelerator | Hopper GPU architecture | CDNA 3.0 |
Process / Foundry | Not yet publicly confirmed | 5 nm (TSMC) | 5 nm (TSMC) |
Transistors | Not specified | 80 billion | 153 billion |
Die Size | Not specified | 814 mm² | 1017 mm² |
Optimization | Decode-stage inference & model training | General-purpose AI & HPC acceleration | AI/HPC compute acceleration |
Supported Formats | FP8, MXFP8, MXFP4, HiF8 | FP16, FP32, FP64 (via Tensor/CUDA cores) | FP16, FP32, FP64 |
Peak Performance | 1 PFLOPS (FP8 / MXFP8 / HiF8), 2 PFLOPS (MXFP4) | FP16: 241.3 TFLOPS, FP32: 60.3 TFLOPS, FP64: 30.2 TFLOPS | FP16: 383 TFLOPS, FP32/FP64: 47.87 TFLOPS |
Vector Processing | SIMD + SIMT hybrid, 128-byte memory access granularity | SIMT with CUDA and Tensor cores | SIMT + Matrix/Tensor cores |
Memory Type | HiZQ 2.0 proprietary HBM (for decode & training variant) | HBM3e | HBM3 |
Memory Capacity | 144 GB | 141 GB | 128 GB |
Memory Bandwidth | 4 TB/s | 4.89 TB/s | 6.55 TB/s |
Memory Bus Width | Not specified | 6144-bit | 8192-bit |
L2 Cache | Not specified | 50 MB | Not specified |
Interconnect Bandwidth | 2 TB/s | Not specified | Not specified |
Form Factors | Cards, SuperPoD servers | PCIe 5.0 x16 (server/HPC only) | PCIe 5.0 x16 (compute card) |
Base / Boost Clock | Not specified | 1365 / 1785 MHz | 1000 / 1700 MHz |
Cores / Shaders | Not specified | CUDA: 16,896, Tensor: 528 (4th Gen) | 14,080 shaders, 220 CUs, 880 Tensor cores |
Power (TDP) | Not specified | 600 W | 600 W |
Bus Interface | Not specified | PCIe 5.0 x16 | PCIe 5.0 x16 |
Outputs | None (server use) | None (server/HPC only) | None (compute card) |
Target Scenarios | Large-scale training & decode inference (LLMs, generative AI) | AI training, HPC, data centers | AI/HPC compute acceleration |
Release / Availability | Q4 2026 | Nov 18, 2024 | Jan 4, 2023 |
Architecture and design approaches
Huawei’s Ascend 950 series is a proprietary AI accelerator architecture optimized for the decode stage of inference as well as model training, rather than a traditional GPU.
Its design blends SIMD and SIMT processing styles with 128-byte memory access granularity, aiming to balance throughput and flexibility.
Nvidia’s H200 is based on the Hopper GPU architecture and integrates 16,896 CUDA cores alongside 528 fourth-generation Tensor cores.
It uses a single-die GH100 GPU fabricated on a 5 nm TSMC process, maintaining compatibility with Nvidia’s software stack and extensive ecosystem.
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
AMD’s MI300 Instinct uses the Aqua Vanjaram GPU with the CDNA 3.0 architecture and a chiplet-based MCM design featuring 220 compute units and 880 matrix cores.
This approach provides a massive transistor budget and a strong focus on high-performance computing.
The Ascend 950 offers peak performance of one petaflop using FP8, MXFP8, or HiF8 data formats and can double to two petaflops when using MXFP4.
This highlights Huawei’s focus on emerging low-precision formats designed to improve efficiency during inference without sacrificing accuracy.
Nvidia’s H200 delivers 241.3 teraflops in FP16 and 60.3 teraflops in FP32, while AMD’s MI300 provides 383 teraflops in FP16 and nearly 48 teraflops for both FP32 and FP64 workloads.
The MI300’s FP64 parity with FP32 underlines its suitability for scientific computation, where double-precision is critical, whereas Nvidia’s focus is skewed toward mixed-precision acceleration for AI.
Memory architecture strongly influences training large language models.
Huawei pairs the Ascend 950 with 144GB of HiZQ 2.0 proprietary HBM, delivering 4TB/s of bandwidth and 2TB/s interconnect speed.
Nvidia equips the H200 with 141GB of HBM3e memory and a 4.89TB/s bandwidth, slightly ahead in raw throughput.
AMD’s MI300 stands out with 128GB of HBM3 but a wider 8192-bit bus and a leading 6.55TB/s memory bandwidth.
For massive model training or memory-intensive simulation, AMD’s advantage in bandwidth can translate into faster data movement even if its total memory capacity trails Huawei’s.
The H200 and MI300 share a 600W thermal design power, fitting into PCIe 5.0 x16 server configurations with no video outputs, underscoring their data center orientation.
Huawei has not disclosed official TDP figures but offers both card formats and integrated SuperPoD servers, suggesting deployment flexibility within its own AI infrastructure solutions.
Its interconnect bandwidth of 2TB/s could be an important factor for multi-chip scaling in data center environments, although details about die size and transistor count remain undisclosed.
Nvidia benefits from a mature NVLink and InfiniBand ecosystem, while AMD’s multi-chip module design aims to reduce latency between compute dies.
Huawei clearly aims its Ascend 950 at large-scale training and decode-stage inference for generative AI, a market where Nvidia has long dominated.
Its Q4 2026 availability means Nvidia’s H200, released in late 2024, and AMD’s MI300, available since early 2023, already have a time advantage.
By the time Ascend 950 hardware reaches customers, both competitors may have iterated on their platforms.
However, Huawei’s emphasis on efficient low-precision formats and tight integration with its networking hardware could attract buyers seeking alternatives to U.S. suppliers.
That said, these accelerators reflect differing philosophies of multiple brands.
AMD prioritizes memory bandwidth and double-precision strength for HPC workloads, while Nvidia leverages ecosystem maturity and software support to maintain dominance in AI training.
Huawei seeks to challenge both with aggressive FP8-class performance and high-capacity proprietary memory.
Via Huawei, Nvidia, TechPowerUp
You might also like
- These are the best mobile workstations you can buy right now
- We've also listed the best mini PCs for every budget
- Intel will build custom x86 CPUs for Nvidia's AI infrastructure

Efosa has been writing about technology for over 7 years, initially driven by curiosity but now fueled by a strong passion for the field. He holds both a Master's and a PhD in sciences, which provided him with a solid foundation in analytical thinking. Efosa developed a keen interest in technology policy, specifically exploring the intersection of privacy, security, and politics. His research delves into how technological advancements influence regulatory frameworks and societal norms, particularly concerning data protection and cybersecurity. Upon joining TechRadar Pro, in addition to privacy and technology policy, he is also focused on B2B security products. Efosa can be contacted at this email: udinmwenefosa@gmail.com
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.