Why computers suck at maths

The discrepancy between the computed answer and the correct answer is often minute, and you might be inclined to dismiss this sort of error as insignificant. However, such errors can add up, and the consequences can be serious.

On 25 February 1991, three days before the end of the first Gulf War, an Iraqi Scud missile hit a US airfield in Dhahran, Saudi Arabia. 28 American soldiers were killed and more than 100 others were injured.

At the time, sensitive targets were supposed to be protected by the Patriot surface-to-air defence system, and one battery of Patriot missiles was assigned to the Dhahran facility – so it's pertinent to ask what exactly went wrong.

The answer is the system's tracking software, and the problem is not unrelated to our online calculator error. In order to avoid potentially costly false alarms, the Patriot's sophisticated radar must first detect an object that has the characteristics of a Scud missile and then detect it a second time in a position calculated by the system on the assumption that the first fix was genuinely a Scud. Only when this second fix provides a confirmation is a missile launched to intercept it.

The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register – as used in the Patriot system – it's out by a tiny amount. But all these tiny amounts add up.

At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m.

The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud – and 28 people paid with their lives.

The processor that couldn't divide

Launched in March 1993, the Pentium was Intel's fifth generation of x86 processor. Unlike previous generations, in which at least some family members could only carry out arithmetic on whole numbers, all Pentiums had a floating point unit (FPU).

An FPU is a piece of built-in hardware for calculating floating point arithmetic. This gave the Pentium a massive speed advantage, since computers without an FPU-enabled processor had to carry out this sort of calculation using software routines that involved lots of integer operations. Unfortunately, it was a poisoned chalice for Intel.

In June 1994, shortly after taking delivery of a Pentium-based PC, Thomas Nicely – then Professor of Mathematics at Lynchburg College, Virginia – noticed that a program he had written was giving inconsistent results.

By running the same program on several machines, Professor Nicely tracked down the problem to the his new PC's Pentium processor and, in particular, to its FDIV (floating point division) instruction. Although it affected just a tiny proportion of floating point divisions, at its worst the error was really quite significant.

Dividing 4,195,835 by 3,145,727 gave an answer of 1.3337 – which represents an error in the fourth decimal place since the correct answer is actually 1.3338. The Pentium's FPU used something called the SRT algorithm to carry out floating point divisions.

Although there are simpler and more obvious ways of dividing one floating point number by another, the SRT algorithm gave a significant speed advantage over previous algorithms.

If you're not a mathematician you'll find a description of how SRT works totally impenetrable. However, let's just say that instead of working everything out using 'pure maths', it involved the use of a look-up table. The table contained 1,000 or so values, but due to a production error five of these values were missing.

Despite the fact that Intel's CEO Andy Grove reckoned that the average user would only see the problem every 27,000 years, IBM's estimate was once every 24 days – and as a result the company stopped shipping Pentium-based PCs. Intel eventually agreed to swap defective Pentiums for good ones.

Most people didn't take up the offer, but the delay caused technically minded users to make Intel the butt of their jokes. The following is typical. Q: How many Pentium designers does it take to change a light bulb? A: 1.99904274017, but that's close enough for non-technical people.