On a typical CPU based cluster, building sub-surface images took anywhere from two hours to several days. With a Tesla system, OpenGeoSolutions reported a performance increase that was "totally unprecedented".
AMD, meanwhile, has made a deal with Silicon Valley startup Aprius Inc to supply FireStream 9270 cards for the Aprius CA8000 Computational Acceleration System. Like the Radeon HD 4800 Series cards, the FireStream 9270 features 800 processor cores. The CA8000 combines eight of these cards into a 4U system that's capable of 9.6 teraflops of acceleration performance. And what's all this power used for? Aprius suggests that CAD/CAM, climate modelling, medical imaging and signal processing applications will all benefit.
Stanford University already uses Radeon GPUs to speed up its protein-folding simulations. The numbers being crunched by the Folding@home project have the potential to help cure diseases such as cancer, Alzheimer's and Parkinson's in the future.
That's all great, you might say, but I'm unlikely to be solving shallow water equations or prospecting for oil beneath the Alaskan ice. What sort of impact will this have on a desktop PC beyond gaming applications?
Right now, not much. If you've got an average graphics card like an Nvidia GeForce 9600 GT, your GPU (which features 64 separate stream-processing cores) can already handle real-time physics effects. Nvidia ported Ageia's PhysX code libraries to its 8-Series GPUs after acquiring the company back in February 2008.
More recently, we've seen the potential for faster media encoding with the release of Badaboom. Ripping a DVD or converting a video file would typically monopolise a CPU-only system. Built with Nvidia's CUDA language, Badaboom allocates this data intensive workload to an Nvidia GPU so that the CPU can still be used for day-to-day tasks.
GPGPU and you
Adobe's Photoshop CS4 has been optimised to offload certain tasks to any Shader 3.0-compatible GPU. The filters in Photoshop aren't that different from pixel shaders. A traditional CPU will apply each filter sequentially so that images take several seconds to re-render. Using the parallel architecture of a GPU, filters can be applied to an image in real-time to provide instant results.
Photoshop CS4 uses OpenGL and GPU acceleration to improve zooming, rotation and transitions at all display levels. Colour matching has also been shunted to the graphics chip. You don't lose any features by not having a compatible GPU, but, as Adobe's Senior Vice President John Nack points out, a PC with a good graphics card "will blow away computers that don't have one."
Cyberlink's PowerDirector 7 software promises "up to five times faster video previewing and rendering performance" with the help of GPU resources. However, as with Photoshop CS4, this extra power only really comes into play when applied to advanced effects.
S3 Graphics, meanwhile, has announced the release of S3FotoPro, an imaging application that uses the GPGPU potential of its Chrome 400/500 chips. According to S3 Graphics, S3FotoPro uses smart image algorithms running on the GPU to "analyse and automatically adjust macro and micro details within a picture to enhance the picture quality".
Available picture enhancements include colour clarity and correction, de-fogging, skin smoothing, gradient blending and saturation and tonal balance adjustments and optimisations. "With support for the latest GPGPU applications and languages, S3FotoPro provides a highly useful and versatile tool for end-users and our partners," says Michael Shiuan, VP of Hardware Engineering at S3 Graphics. "Application processes that required days to complete can now be completed in seconds using a GPGPU product like ours."
The GPGPU problem
Though GPUs are extremely efficient at streaming and processing data, most PC apps are serial in nature. A GPU can't turbocharge your word processor, for example, or speed up your anti-virus package. "One of the reasons GPU designers can deliver huge peak performance numbers is that they've greatly constrained the architecture," says Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab. "What this means is that they can design more efficient processors by not dealing with the messy, irregular patterns of computation that most applications inevitably deal with. These include looping conditions, unpredictable branches and irregular memory access patterns." Read more on Ghuloum's blog here.