The future of graphics cards revealed

Light and bright

With an optimised view space created, lighting can be applied. It's important to understand this isn't the visual representation of light, it's calculating how 'bright' every surface is going to be. A scene can have a global light-source, along with point-sources and spotlight sources.

Every triangle surface will have material properties, such as ambient, diffuse, specular and emissive material colours. For every source and every triangle a calculation will be made to determine its total luminosity. As you can imagine the more sources there are, the larger the calculation expense.

It's important to remember that, at this stage, all we know is the luminance for each triangular surface, the actual rendering comes later..

For now lets just say each pixel can now be blended with its corresponding lighting values, textures and other effects, such as bump maps and light maps. On top of this each pixel will have filtering applied, fogging, shadow values and even antialiasing to produce the final image.

If you're feeling a bit dazed and wondering what that's useful for, it's so you have an overview of what goes into creating a single 3D frame, which is on screen for mere milliseconds. As graphics cards have developed more of that pipeline has been moved or added to the graphics card.

With the original 3D cards, only the end rasterisation and rendering stages were performed on-card and that was by dumb, fixed-units that could only perform a single render pass. Multitexturing and multi-pass rendering improved visual quality and when DirectX 7.0 was released in 1999, graphics cards got a little smarter because of Transform and Lighting (T&L).

T&L moved the lighting and vertex transformation stages on to the graphics card and was the first move away from CPU-based vertex handling. It wasn't until the introduction of DirectX 8 that things really got interesting, as the first shaders appeared.

Vertex shaders enable programmers to manipulate vertices directly on the card, while pixel shaders replaced the fixed multi-texture engines with programmable ones. These gave graphics cards their first smarts, even though these were limited; there couldn't be any branches in the code, there were limits on the number of commands and variables, plus the total program length was very short.

So while technically these cards were running programs of a sort, the two types of shader units were different in design and very limited.

The smart stuff

It took until DirectX 9.0c was released in 2004 with Shader Model 3.0 that cards started to look more like a collection of smart processors than dumb fixed logic. Dynamic branching, program lengths over 512 commands and access to hundreds of registers made graphics cards sound more like mini-super computers.

The final evolution came with unified shaders introduced in DirectX 10 and Shader Model 4.0. At this point there's no distinction between vertex or pixel shaders. Cards have 'unified' shaders, akin to having hundreds of tiny dedicated processing units and are found on both the GeForce 8 and Radeon HD 2000, and later generations of cards.

This has enabled both AMD and Nvidia to start offering GP-GPU features and programming languages for current graphics cards and which allow them to process physics and other mathematically complex data alongside 3D rendering.

Larry who?

As testament to the idea that shaders are becoming processors in their own right, Intel is wading into the graphics arena and the ripples could permanently erode the market that once seemed so rock solid.

As we already know the new GPU is codenamed Larrabee and its heart is based, in part, on the original x86 Pentium core. Intel is on record as saying it can, in theory, run OS kernel level code. The idea is to take a bunch of optimised, in-order x86 Pentium cores, add in a Vector Processing Unit and tie the whole thing together via each core's L2 cache using a high-speed ring bus.

Alongside the multi-core design there's a dedicated texture filtering unit, plus the usual extra gubbins for the memory controller, display and system interfaces. Intel is approaching the problem in the opposite direction to AMD and Nvidia. It's almost dumbing-down an x86 core to help fit as many as possible onto a GPU die.

All parties are selling these as more than just a graphics solution. Intel is partnering with Dreamworks, who will be using Larrabee as an accelerated computing platform for ray tracing frames within its animated features. With Intel measuring a 1GHz, 24-core Larrabee GPU running almost five times faster than an eight-core Xeon processor at 2.6GHz at ray tracing. This shows the huge acceleration potential GP-GPU solutions have in the real world.

Currently no one has any idea how well Larrabee will perform, if it performs at all. However, we managed to dig out some figures from a paper Intel published. It estimates the performance of a Larrabee processor running F.E.A.R., Gears of War and Half-Life 2: Episode 2. The most interesting section took the DirectX commands generated from a sequence of random frames from each of these games.

These commands were fed through a 'functional model' of Larrabee rendering at 1,600x1,200 with 4x AA. The test was to see how many 1GHz cores were required to keep a constant 60fps output for each game. The answers is between 10 and 24 cores depending on the game.

Clearly this is nowhere near the performance of top-end cards, the frames would have to be nearer 180fps at that resolution, but even so at 3GHz with 24 cores that would be achievable and still in the realms of reality.

When Larry comes

By the time Larrabee launches, it could be almost 2010 and both Nvidia and AMD will have had next-gen DirectX 11 devices well out of the stable. Intel's own figures show that its core scaling works well up to and over 48 cores with apparently only a two to ten per cent drop in performance.

It's impossible at this stage to know how much a Larrabee card will cost, but we can make several massive assumptions based on existing technology.

For example, a 24-core GPU would require 6MB of L2 cache, that's roughly 300 million transistors. Lets guesstimate that the x86 modified Pentium cores are twice their original sizes at 6 million transistors, that's around 450 million transistors in total for a 24-core Larrabee GPU.

Now, if you accept those transistor counts and accept fab costs are closer to that of a full processor than a GPU, at roughly half the transistor count of a 3GHz Core i7, the consumer price could be up to £230. That's not including the 1GB of GDDR5, of course.

The issue is whether Intel can put out a GPU that's affordable and a good performer, when the Larrabee's launched. At least AMD and Nvidia will put us out of our misery soon enough, as they're both expected to field hardware supporting DirectX 11 in the second half of 2009.

It will be interesting to see, which of the two has the most powerful GP-GPU solution, but regardless Intel won't get an easy ride. The quality of Intel's drivers is going to be a key issue and support for dual-GPU or SLI-style, dual-card support may be a necessity, if it wants to compete for the performance crown.

-------------------------------------------------------------------------------------------------------

First published in PC Format Issue 226