It's been a long time coming, but AMD finally has a decent quad-core chip. Launched today in Opteron server trim, AMD's new 45nm microarchitecture is far from revolutionary. But it does make for an excellent server CPU and also bodes well for upcoming desktop derivatives.
Known internally at AMD under the Shanghai codename, the new microarchitecture doesn't amount to much more than a die shrink from 65nm production technology to a new 45nm node.
Both the overall quad-core layout and the detailed architecture of the cores have been carried over pretty much unchanged. OK, the memory controller has been massaged. It now supports DDR2 running at 800MHz. The HyperTransport links have also been upgraded to 3.0 spec.
Oh, and AMD has improved some aspects of the chip's virtualisation support. Particular attention has been paid to the time taken for so-called 'world-switching'.
That's the process of shifting a virtual machine between host and guest systems. If that sounds rather arcane, virtualisation is an increasingly important aspect of server CPU performance. This stuff really matters.
But when you get right down to it, Shanghai is all about 45nm silicon. And the good news is that all the indications are that it's a very healthy process for AMD.
Unlike last year's troubled Barcelona launch, Shanghai is on time and hitting the ground running at decent clockspeeds. At launch, the new Opterons span a range of clockspeeds between 2.3GHz and 2.7GHz.
That might seem a little ordinary compared to the 3.4GHz clock of Intel's top Xeon server CPU. But it's worth remembering that the best quad-core server processor AMD could actually ship a little over a year ago was 1.9GHz.
Arguably even more impressive are the power ratings of the new Opterons. Even the top 2.7GHz 2384 model is rated at just 75 watts. That's a huge step forward compared to the 105 watt rating of the outgoing 2.5GHz 65nm Opteron 2360. Overall, AMD says the new 45nm process is 35 per cent more power efficient.
As you'd expect with any die-shrink, the new 45nm process also allows AMD to pack a few more transistors into the die. For the new Opterons, that takes the form of a boost from 2MB to 6MB of L3 cache memory.
Building on strengths
But what about performance? Given the carry-over architecture, we weren't expecting fireworks. However, what Shanghai does extremely well is build on the existing strengths of its Opteron family of processors.
As before, multi-socket scaling is excellent. In benchmarks that major on data bandwidth and floating point performance, a pair of 2.7GHz Opteron 2384 chips are probably the fastest dual-socket solution currently available.
Certainly they have the measure of Intel's Xeon processors running at 3.2GHz in our computational fluid dynamics benchmark. And this is just the dual-socket version of Opteron. Given AMD's traditional advantage as sockets are added, we'd expect the four-way model to have an even greater advantage.
That said, integer performance is slightly less impressive. But it's significantly closer than before thanks to the boost clocks and cache memory. The overall picture is extremely solid, therefore.
Factor in the big improvements in power efficiency - again, that's something that really matters for high density server installations - and AMD is now in a position to make a pretty compelling overall argument to server customers.
At least, that's true in the context of the current Intel competition. Early next year, Intel will roll out its impressive new Nehalem architecture in servers. We've already seen what a beast it is on the desktop in Core i7 trim. AMD had better make the most of the next few months.
Comparative benchmarks - AMD Opteron 2384 v Intel Xeon X5472
AMD Opteron 2384
Memory bandwidth 20.62 GB/s
Sandra Dhrystone ALU arithmetic 77.72 GIPS
Sandra Whetstone FPU 62.99 GFLOPS
Cinebench 10 multi-core 45secs
Cinebench 10 single-core 4mins 2secs
X264 encode 74fps
Stars CFD 30.32secs
Intel Xeon X5472
Memory bandwidth 10.4 GB/s
Sandra Dhrystone ALU arithmetic 104.57 GIPS
Sandra Whetstone FPU 83.76 GFLOPS
Cinebench 10 multi-core 37secs
Cinebench 10 single-core 3mins 44secs
X264 encode 86fps
Stars CFD 37.34secs