Everybody's talking about supercomputing on the desktop – and in particular, whether it will be GPUs that achieve that goal. We think that general-purpose computation on GPUs (an idea known as the GPGPU) might be the most important computing trend over the next 10 years.

As claims go, it's a biggie. But if you want proof of the industry's faith in the new concept, just take a look at the companies that want a slice of the GPGPU pie: Nvidia, AMD, Intel, Microsoft, IBM, Apple and Toshiba all want in. And it's not just speculation that's leading to such big interest: GPGPU systems are already outperforming CPU-only clusters in fields as diverse as molecular dynamics, ray tracing, medical imaging and sequence matching.

The combination of parallel CPU and GPU processing used to achieve these results is often dubbed 'heterogeneous computing'. The GPGPU concept enables the GPU to moonlight as a versatile co-processor. As Nvidia's David Luebke has suggested, computers are no longer getting faster; the move to multicore processors means that they're actually getting wider.

That's the idea that GPGPU computing cashes in on. By intelligently offloading data intensive tasks from the CPU to other processor cores (such as those in a graphics card), developers achieve improved application performance through parallelism.

The GPGPU is hardly a new idea, however. According to website www.gpgpu.org, GPU technology has been used for number crunching since 1978, when Ikonas developed a programmable raster display system for cockpit instrumentation.

From GPU to GPGPU

Modern GPUs make ideal co-processors. Not only are they cheap, they're also blisteringly fast, thanks to the presence of multiple processor cores. Most importantly, these multiple cores are programmable. While CPUs are designed to process threads sequentially, GPUs are designed to burn through data in parallel.

The Nvidia GeForce GTX 280, for example, is built for speed. As a gaming component, it's capable of delivering smooth high-definition visuals with complex lighting effects, textures and realtime physics. Just take a look at Far Cry 2 in 1,920 x 1,200 pixels. With 1.4 billion transistors, the GeForce GTX 280 commands 240 programmable shader cores that can provide 993 gigalops of processing power.

AMD's graphics technology is equally potent. Its 4800 Series Radeon HD cards feature 800 programmable cores and GDDR5 memory to deliver 1.2 teralops of processing power. "Strict pipelining of GPU programs enables efficient access to data," says Shankar Krishnan at AT&T's Research Labs. "This obviates the need for the extensive cache architectures needed on traditional CPUs and allows for a much higher density of computational units."

Of course, if you're not playing Far Cry 2 or Fallout 3 then all this processing potential is just sitting about twiddling its thumbs. GPGPUs will allow us to see what will happen if other applications are able to make use of the processors in a graphics card.

Stream processing

This is why Nvidia and AMD are keen to harness the GPGPU potential of their graphics hardware. Nvidia's Tesla Personal Supercomputer, for example, combines a traditional quad-core workstation CPU with three or four Tesla C1060 processors.

A C1060 is effectively a GeForce GTX 280 with 4GB of GDDR3 memory and no video-out. Each C1060 is capable of 933 gigalops of single-precision floating point performance, so Nvidia's top-of-the range four-GPU S1070 system packs up to 4.14 teralops of processing power in each rack. The Tokyo Institute of Technology recently bought 170 of them to give its Tsubame supercomputer some extra kick.

GPUs make ideal number crunchers because they're designed to work with 'streams' of data and apply preprogrammed operations to each part. GPUs are at their best working with large datasets that require the same computation. Calgary-based company OpenGeoSolutions uses Nvidia's Tesla hardware to improve its seismic modelling via a technique called spectral decomposition. The process involves analysing low level electromagnetic frequencies (caused by variances in rock mass) to build a stratigraphic view of the earth's geology.

On a typical CPU based cluster, building sub-surface images took anywhere from two hours to several days. With a Tesla system, OpenGeoSolutions reported a performance increase that was "totally unprecedented".

Scientific research

AMD, meanwhile, has made a deal with Silicon Valley startup Aprius Inc to supply FireStream 9270 cards for the Aprius CA8000 Computational Acceleration System. Like the Radeon HD 4800 Series cards, the FireStream 9270 features 800 processor cores. The CA8000 combines eight of these cards into a 4U system that's capable of 9.6 teraflops of acceleration performance. And what's all this power used for? Aprius suggests that CAD/CAM, climate modelling, medical imaging and signal processing applications will all benefit.

Stanford University already uses Radeon GPUs to speed up its protein-folding simulations. The numbers being crunched by the Folding@home project have the potential to help cure diseases such as cancer, Alzheimer's and Parkinson's in the future.

That's all great, you might say, but I'm unlikely to be solving shallow water equations or prospecting for oil beneath the Alaskan ice. What sort of impact will this have on a desktop PC beyond gaming applications?

Right now, not much. If you've got an average graphics card like an Nvidia GeForce 9600 GT, your GPU (which features 64 separate stream-processing cores) can already handle real-time physics effects. Nvidia ported Ageia's PhysX code libraries to its 8-Series GPUs after acquiring the company back in February 2008.

More recently, we've seen the potential for faster media encoding with the release of Badaboom. Ripping a DVD or converting a video file would typically monopolise a CPU-only system. Built with Nvidia's CUDA language, Badaboom allocates this data intensive workload to an Nvidia GPU so that the CPU can still be used for day-to-day tasks.

GPGPU and you

Adobe's Photoshop CS4 has been optimised to offload certain tasks to any Shader 3.0-compatible GPU. The filters in Photoshop aren't that different from pixel shaders. A traditional CPU will apply each filter sequentially so that images take several seconds to re-render. Using the parallel architecture of a GPU, filters can be applied to an image in real-time to provide instant results.

Photoshop CS4 uses OpenGL and GPU acceleration to improve zooming, rotation and transitions at all display levels. Colour matching has also been shunted to the graphics chip. You don't lose any features by not having a compatible GPU, but, as Adobe's Senior Vice President John Nack points out, a PC with a good graphics card "will blow away computers that don't have one."

Cyberlink's PowerDirector 7 software promises "up to five times faster video previewing and rendering performance" with the help of GPU resources. However, as with Photoshop CS4, this extra power only really comes into play when applied to advanced effects.

S3 Graphics, meanwhile, has announced the release of S3FotoPro, an imaging application that uses the GPGPU potential of its Chrome 400/500 chips. According to S3 Graphics, S3FotoPro uses smart image algorithms running on the GPU to "analyse and automatically adjust macro and micro details within a picture to enhance the picture quality".

Available picture enhancements include colour clarity and correction, de-fogging, skin smoothing, gradient blending and saturation and tonal balance adjustments and optimisations. "With support for the latest GPGPU applications and languages, S3FotoPro provides a highly useful and versatile tool for end-users and our partners," says Michael Shiuan, VP of Hardware Engineering at S3 Graphics. "Application processes that required days to complete can now be completed in seconds using a GPGPU product like ours."

The GPGPU problem

Though GPUs are extremely efficient at streaming and processing data, most PC apps are serial in nature. A GPU can't turbocharge your word processor, for example, or speed up your anti-virus package. "One of the reasons GPU designers can deliver huge peak performance numbers is that they've greatly constrained the architecture," says Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab. "What this means is that they can design more efficient processors by not dealing with the messy, irregular patterns of computation that most applications inevitably deal with. These include looping conditions, unpredictable branches and irregular memory access patterns." Read more on Ghuloum's blog here.

GPUs excel at data parallelism because they have lots of maths units and fast access to onboard memory. They also achieve a high throughput on parallel tasks because programs can be executed on each shader core.

However, CPUs still rule the roost as far as task parallelism is concerned because their fast caches enable efficient data retention and they can handle branching. CPUs can achieve high performance on a single thread. In other words, the processors complement each other.

To get the most out of this CPU/GPU partnership in the future, developers will need to change the way that applications are coded. That's where programming languages such as CUDA and OpenCL come in.

We've yet to see the best of what GPGPU computation has to offer. The programmability of GPU cores could make real-time ray tracing a possibility, while the concept of GPU-accelerated storage could allow PCs to encrypt and compress files on the fly. And the GPGPU concept isn't restricted to desktop systems and workstations, either. There's scope for multiple processing on mobile devices too.

An alternative future

Another possible scenario is that current GPGPU initiatives are just a stopgap measure until CPU and GPU platforms converge. In a recent interview with website Ars Technica, Epic Games' co-founder Tim Sweeney suggested that: "In the next console generation, consoles could consist of a single non-commodity chip. It could be a general processor that has evolved from a past CPU architecture or GPU architecture, and it could potentially run everything – the graphics, the AI, the sound – in an entirely homogeneous manner. That's a very interesting prospect because it could dramatically simplify the toolset and the processes for creating software."

This sounds a little like Intel's Larrabee project, which is due for release in early 2010. So perhaps a dramatic change in processor architecture isn't as far away as some people might think. One thing's for sure, though: don't count the GPU out just yet.

-------------------------------------------------------------------------------------------------------

For more on Larrabee read How Intel's Larrabee GPU tears up the rulebook

Sign up for the free weekly TechRadar newsletter
Get tech news delivered straight to your inbox. Register for the free TechRadar newsletter and stay on top of the week's biggest stories and product releases. Sign up at http://www.techradar.com/register

Follow TechRadar on Twitter