These really are, as Paul Simon sang, the days of miracles and wonders. It seems almost impossible to believe that engineers have now managed to design and build a machine whose components measure as little as 40nm across. That's just one-thousandth the diameter of a human hair.

Yet we're making such devices right now. They're called general processing units, and they're starting to challenge the central processing unit for its long-held title of the most magical piece of engineering found in a computer.

What threat is the CPU facing? Consider a top-of-the-range desktop processor like the quad-core Sandy Bridge variant of Intel's Core i7. It boasts just short of a billion transistors. A leading edge general processing unit like AMD's Cayman, as used in the Radeon HD 6970 graphics card, clocks up a massive 2.64 billion transistors.

Given that it boasts no fewer than 1,536 shader processors, 24 SIMD (single instruction multiple data) engines and 32 ROPs (raster operator units), this perhaps isn't surprising. This is the story of how AMD GPUs are made - how an idea becomes silicon using some of the most advanced and intricate engineering modes and mechanisms. Read on as we delve into the real days of miracle and wonder.

1. The high level design


Designing a general processing unit doesn't start with any thought of transistors or copper tracks, but with something called the product requirement specification, or PRS - a prioritised definition of all the features the new chip must have. It might not sound wildly exciting, but the PRS acts as the checklist throughout the whole design process.

Given that design is a very costly exercise, in terms of time and money, it's vitally important that the PRS provides an adequate answer to the question: 'What exactly is it that we're trying to build here?'

Typically it will take six months to complete the PRS. Thousands of engineers, including architects, hardware designers, board designers, validation engineers, software engineers and firmware/BIOS engineers will be involved, as will representatives from product management, technology management and developer relations.

The document takes the form of a database and could include over 1,000 features, each of which could be anything from an odd sentence to a 100-page specification.

Another output from the high level design - one that most technically savvy PC users will be familiar with - is a block diagram. Although it bears no resemblance to how the elements of the GPU will be arranged on the chip, it includes each of the major functional blocks and shows how signals pass between them.

2. Floorplan and netlist

Hardware language

Teams of engineers now set to work on two distinct areas of the design. First, the floorplan must be defined. This is a physical representation that will take account of how large each block is expected to be and where it should be positioned relative to other blocks.

Here, account is taken of how many signals pass between the blocks with the aim of reducing the lengths of the pathways. Meanwhile, other engineers work on the component level design of each of the blocks.

However, this is nothing like electronic circuit design as envisaged by the layperson. Instead of a circuit diagram, the design is created in a hardware description language like VDHL. If you're interested, the compound acronym stands for very-high-speed integrated circuits (that's what the 'V' stands for) hardware description language.

Looking much like a programming language, this way of generating circuits provides many of the benefits on offer to the software engineer. Most importantly circuits can be defined hierarchically so, for example, having defined a logic OR gate from individual transistors, this can be used in the definition of a more complicated block like a one-bit adder.

In the same way, increasingly sophisticated building blocks are built up by reusing what's already been created. Often the designer won't even have to define the building blocks, because they'll be available from third-party libraries.

When the VDHL code is complete, it goes through a process called synthesis, which is the equivalent of compiling a programming language. Whereas compilation of a programming language checks the code for errors and, once it's error-free, generates a file containing individual processor instructions, the output of synthesis is called a netlist and it defines the connections between each and every component, including those 2.64 billion transistors.

3. Circuit verification and emulation

The netlist could go directly into the mask making process, but this would be asking for trouble because designs as complicated as a GPU are never 100 per cent correct on the first attempt. What's more, given that a set of masks could cost $1million, testing the design on real silicon would be prohibitively expensive.

Instead the design is verified and emulated - a hugely processor-intensive operation that requires supercomputing resources. Verification involves testing individual blocks with perhaps thousands of tests per block. Each time anything fails, the design team backtracks to correct the errors and then performs a full set of simulation tests to make sure the remedial action hadn't broken something that previously worked correctly.

Once all the individual blocks are operational, the team moves to emulation. This means exercising the GPU as a whole, but given the amount of processing time needed to simulate a multi-billion transistor chip, these tests might initially be nothing more complicated than drawing a single pixel.

In addition to functional testing, emulation also ensures that the chip meets its requirements in terms of processing speed.

4. Making the masks

With the simulation out of the way the designers know that the circuit connections are correct, but so far, with the exception of the top-level floorplan, no thought has been given to where the components go on the chip. This is carried out using a special CAD package, driven by the floorplan.

This largely automated process places each component and routes the copper tracks that will ultimately connect them all together. The culmination of this process is a major milestone referred to as 'tape out', and marks the transition from design to fabrication.

Since AMD is a fabless semiconductor company, this is also the point at which it hands the baton to TSMC, its chosen foundry for GPUs. Before any chips can be manufactured though, the foundry needs to create a set of photographic masks that will be used in the photolithography - one for each of the many layers by which the circuit is built up on the chip.

Using the data supplied at tape out (which can be thought of as images of the patterns on each layer), the masks are created as a patterned layer of opaque metallic chromium on the surface of quartz glass.