Running tasks in space

Because of its hardware limitations, the AGC pioneered the use of the kind of priority-driven kernel we see in desktop operating systems today. Its programs were always one of two types: short tasks designed to take no more than 4ms, and jobs that were designed to run for longer.

Most importantly, the system watched out for failure. If a program on your desktop crashes, you can reboot the system hardware to restore normal operations. In space, that's not an option.

The AGS's operating system required a special monitor program designed to restart individual tasks whenever they took too long to respond (indicating that they'd failed to complete). Such a situation actually occurred during the descent of Apollo 11's Eagle Lunar Module.

With the world holding its breath, Eagle's rendezvous radar started swamping the program assigned to monitor it. Handling this data caused the program to fail with the now famous '1202 alarm' – a warning that the system was overloaded. Luckily for Neil Armstrong and Buzz Aldrin, Mission Control trusted that the reset program would do its job flawlessly.

The need for software without bugs led to the rapid development of advanced software verification techniques. NASA also acquired considerable experience in managing large, real-time software projects that would lead directly to the development of the fly-by-wire systems used first in fighter aircraft and now in commercial airliners.

But even before the world watched Armstrong and Aldrin happily bounce about on the lunar surface for the first time, NASA was already drawing up plans for an advanced reusable spacecraft that would be entirely fly-by-wire.

Flying the Space Shuttle

Partly because of its association with NASA, by the late 1960s IBM had a wealth of experience of flight computers. So rather than commission another bespoke system for the Space Shuttle, NASA chose to simply buy the IBM AP-101 avionics computer, as used in the B-52 nuclear bomber and the F-15 fighter.

Due to the complexity of the Shuttle, it uses five of these systems, re-christened the General Purpose Computer (GPC). All of the Shuttle's subsystems (radar, life support and so on) are interconnected by more than 300 miles of wiring, and are designed to share no fewer than 24 data buses running throughout the ship. The GPCs share eight of the buses.

On early flights, the crew also carried a Hewlett-Packard HP-41C calculator programmed to determine ground-station availability, and to tell them when to fire the re-entry retro rockets should it be necessary to attempt an emergency manual de-orbit.

For the safety of the crew and the expensive payloads it carries, four of the GPCs replicate each other's functions. If one obtains a result that the others don't, it's presumed to be wrong.

The fifth computer is programmed by a different team and acts as a backup if a second opinion is required. In addition, a sixth GPC is on board during each flight and can be swapped with a malfunctioning unit if necessary.

When the shuttle first flew in 1981, the GPCs each contained just 104kB of core memory. It wasn't until 1984 that NASA approved an upgrade to a faster processor and 128kB (expandable to 256kB) of silicon RAM. Due to technical problems, the upgraded computers didn't fly until 1991.

Trouble at the station

The International Space Station (ISS) is now seen as man's most successful permanent step off the planet: the pinnacle of space flight so far. But its off-the-shelf computers aren't exactly cutting-edge.

Other than the embedded systems that control the station itself, the crew use ordinary IBM Thinkpad laptop computers for general computing duties and experiments. Now manufactured by Chinese manufacturer Lenovo, they're used for their history of construction quality.

But things haven't always gone according to plan for the computers controlling the ISS. Once the sophistication of computers gets to a critical point, it seems inevitable that failures will occur. Onboard navigation computers regulate the station's position and angle using an array of gyroscopes and thrusters. They also control oxygen generation.

In 2007, these computers crashed due to a power surge while a visiting Shuttle crew deployed two new solar panels. Ground-based flight controllers had to use the Shuttle's thrusters to keep the space station at the right angle to the sun while Russian engineers raced to fix the problem.

Mir

MIR: Retired in 2001, Mir was the longest running space station and had people continuously on board for 10 years

However, after rebooting the navigation systems, an alarm sounded and flight controllers asked shuttle commander Rick Sturckow to re-enable Atlantis' autopilot to again keep the ISS in position. Eventually, the problem was traced to a blown circuit, and the station's navigation capability was restored.

Not all problems are so technical, however. One downside of off-the-shelf systems is the greater chance of human error creeping in, as when one crew member accidentally introduced the 32.Gammima.AG keylogger virus to the station on an infected laptop.This spread to several other laptops.

Described by NASA as a "nuisance", it's lucky the laptops in question were used only for non-critical tasks, including composing email and storing the results of nutritional experiments.