Nvidia's bombastic boss Jensen Huang lifted the lid last night on the firm's latest graphics chip. The new GPU, codenamed Fermi, will pack around three billion transistors and be at least twice as powerful as the best chip Nvidia currently offers.

In an usual move at Nvidia's GPU Tech Conference, Huang revealed much of the new chip's architecture despite the fact that it's not expected to go on sale until at least the first quarter of 2010.

The key facts go as follows. Fermi will be a 40nm chip with full support for Microsoft's DirectX 11 API. So far, so like AMD's new Radeon HD 5870, a GPU you can already buy today.

Naked ambition

Where it differs from AMD's latest is in sheer scale and ambition. With an expected transistor count around the three billion mark, it's twice as complex as the 5870.

It also likely to be much more powerful thanks to no less than 512 of what Nvidia used to call stream processors and now prefers to label CUDA cores. The name change reflects the work Nvidia has done optimising the new chip for general purpose computing, more on which in a moment.

Anyway, if you're wondering how Fermi with 512 cores is going to be faster than the Radeon HD 5870 and its 1,600 stream processors, remember that the two architectures are not directly comparable. After all, Nvidia's Geforce GTX 285 out-pixels the Radeon HD 4890 despite touting just 240 stream processors to the 4890's 800.

Huang also revealed that Fermi makes do with a memory bus just 384 bits wide. Superficially, that's a downgrade compared to the 512-bit bus of Nvidia's current top chip. However, the addition of GDDR5 support means overall bandwidth will go up, not down. And it's still wider than the Radeon HD 5970's 256-bit bus.

Soul of a supercomputer, body of a GPU

Despite the inevitable graphics grinding prowess of Fermi, the story that Nvida really wants to get across is how it fits into the masterplan for GPGPU (or general purpose computing on the GPU) variously known as CUDA or Tesla depending on whether you're talking software or hardware.

With Fermi Nvidia has dedicated more resources than ever before to processing general purpose code. Specifically, it's claimed that double precision floating point performance is now only 50 per cent slower than single precision. On Nvidia's previous GPUs, the ratio is just one to eight.

Factor in the added cores and the result should be getting on for nearly 10 times the performance of Nvidia's existing technology in some scenarios.

Going exponential

Nvidia has also beefed up Fermi's per-core L1 and shared L2 cache. The result is reduced wait times when doing certain operations and in turn much higher performance. Anywho, the upshot of all this is that Nvidia is hoping Fermi is the chip that finally takes the whole GPGPU segment – and sales of its chips – exponential.

All that said, there's still lots we still don't know about Fermi such as clockspeeds and pricing. But the most important imponderable is the chip's launch date. Nvidia desperately needs to get Fermi off the keynote stage and into our PCs.