For years, scientists have enjoyed the benefits of supercomputers that contain thousands of processors each working in parallel to achieve phenomenal levels of performance. The advent of multi-core processors means that these advantages are on offer to ordinary computer users, but unlike earlier computing design improvements, the gains available from working in parallel are not guaranteed.
So if multiple cores is the way to go, it's not surprising that manufacturers are queuing up to release chips with ever more cores. Most PC users now consider processors with two cores as entry-level, and those with triple- and quad-core chips are being touted for those users with power-hungry applications who want a little more out of their computer. Some companies are already selling products that use chips with a higher number of cores, although they are less mainstream.
Sun Microsystems use an eight-core UltraSPARC T2 processor in their latest servers, and some companies have chips with hundreds of cores that are aimed at specialised applications such as mobile phones. The PC market isn't getting left behind, though: both Intel and AMD are expected to have eight-cores in the near future; and Intel has even more ambitious plans.
Approaches to parallelism
Multi-core might be the buzz-phrase of the moment, but this certainly doesn't represent the industry's first step into parallel computation – far from it. The multi-core approach is just the latest method in a long line of techniques aiming to do more than one thing at once.
For many years, processors performed one operation per clock cycle. So a processor with a clockspeed of 1MHz was able to perform one million operations per second. However, as processors became more sophisticated, instructions took more than a single clock cycle to execute and this relationship broke down.
Those instructions might have done the work of lots of the simpler instructions – and certainly clockspeeds increased dramatically – but there was a time when this had to be balanced against the fact that instructions were executed over many clock cycles.
The move back to single-cycle instructions was the result of a technique called 'pipelining', which represented the computing industry's first faltering steps into parallelism. Executing an instruction is a multi-step process. At its simplest stage, it might involve decoding the instruction, loading from memory the data for it to work on, performing the necessary action on that data and finally writing the results back to memory.
Each of these steps could take a clock cycle. Pipelining involves separating these steps and performing them in parallel. It isn't possible to perform all those steps in parallel for a single instruction, but as soon as one instruction has been decoded, the decoder is free to start decoding the next instruction at the same time as the data is being loaded for the first instruction.
Pipelining doesn't quite give us single clock cycle instructions – mainly because the pipeline has to be flushed each time the program branches – but it comes close.
The next development in parallelism was the provision of multiple execution units in processors. A chip might have two arithmetic/logic units (which perform integer arithmetic and logical operations) and two floating point units (which perform arithmetic on floating point numbers). When used alongside pipelining, the multiple execution units permitted – for the first time – more than one instruction to be executed in a clock cycle.
However, the improvements aren't as dramatic as you might expect. For a start, it often isn't possible to work on two consecutive instructions in parallel because the second of the two might require the result of the first. And secondly, executing multiple instructions in parallel might involve guessing whether or not a branch will be taken.