Why your next CPU could be a GPU

Will your next CPU be a GPU?
With all the big boys wading into the GPGPU war, it may not be that long until we get a taste of what a petaflop really means

Everybody's talking about supercomputing on the desktop – and in particular, whether it will be GPUs that achieve that goal. We think that general-purpose computation on GPUs (an idea known as the GPGPU) might be the most important computing trend over the next 10 years.

As claims go, it's a biggie. But if you want proof of the industry's faith in the new concept, just take a look at the companies that want a slice of the GPGPU pie: Nvidia, AMD, Intel, Microsoft, IBM, Apple and Toshiba all want in. And it's not just speculation that's leading to such big interest: GPGPU systems are already outperforming CPU-only clusters in fields as diverse as molecular dynamics, ray tracing, medical imaging and sequence matching.

The combination of parallel CPU and GPU processing used to achieve these results is often dubbed 'heterogeneous computing'. The GPGPU concept enables the GPU to moonlight as a versatile co-processor. As Nvidia's David Luebke has suggested, computers are no longer getting faster; the move to multicore processors means that they're actually getting wider.

That's the idea that GPGPU computing cashes in on. By intelligently offloading data intensive tasks from the CPU to other processor cores (such as those in a graphics card), developers achieve improved application performance through parallelism.

The GPGPU is hardly a new idea, however. According to website www.gpgpu.org, GPU technology has been used for number crunching since 1978, when Ikonas developed a programmable raster display system for cockpit instrumentation.


Modern GPUs make ideal co-processors. Not only are they cheap, they're also blisteringly fast, thanks to the presence of multiple processor cores. Most importantly, these multiple cores are programmable. While CPUs are designed to process threads sequentially, GPUs are designed to burn through data in parallel.

The Nvidia GeForce GTX 280, for example, is built for speed. As a gaming component, it's capable of delivering smooth high-definition visuals with complex lighting effects, textures and realtime physics. Just take a look at Far Cry 2 in 1,920 x 1,200 pixels. With 1.4 billion transistors, the GeForce GTX 280 commands 240 programmable shader cores that can provide 993 gigalops of processing power.

AMD's graphics technology is equally potent. Its 4800 Series Radeon HD cards feature 800 programmable cores and GDDR5 memory to deliver 1.2 teralops of processing power. "Strict pipelining of GPU programs enables efficient access to data," says Shankar Krishnan at AT&T's Research Labs. "This obviates the need for the extensive cache architectures needed on traditional CPUs and allows for a much higher density of computational units."

Of course, if you're not playing Far Cry 2 or Fallout 3 then all this processing potential is just sitting about twiddling its thumbs. GPGPUs will allow us to see what will happen if other applications are able to make use of the processors in a graphics card.

Stream processing

This is why Nvidia and AMD are keen to harness the GPGPU potential of their graphics hardware. Nvidia's Tesla Personal Supercomputer, for example, combines a traditional quad-core workstation CPU with three or four Tesla C1060 processors.

A C1060 is effectively a GeForce GTX 280 with 4GB of GDDR3 memory and no video-out. Each C1060 is capable of 933 gigalops of single-precision floating point performance, so Nvidia's top-of-the range four-GPU S1070 system packs up to 4.14 teralops of processing power in each rack. The Tokyo Institute of Technology recently bought 170 of them to give its Tsubame supercomputer some extra kick.

GPUs make ideal number crunchers because they're designed to work with 'streams' of data and apply preprogrammed operations to each part. GPUs are at their best working with large datasets that require the same computation. Calgary-based company OpenGeoSolutions uses Nvidia's Tesla hardware to improve its seismic modelling via a technique called spectral decomposition. The process involves analysing low level electromagnetic frequencies (caused by variances in rock mass) to build a stratigraphic view of the earth's geology.