How MP3 compression works

There things stood until the age of the personal computer and the internet. A three-minute track on a CD (which is the length of a typical pop song) occupies 31,752,000 bytes, or just over 30MB. Downloading a CD track using a 9,600 baud modem would take hours, and would still take well over an hour on a 56K dialup modem (the fastest retail modem before broadband became mainstream).

On a typical broadband connection (12mbit/s download, for example), that track would take under three minutes to download, meaning you could just about stream it while listening to it. The solution would seem to be to compress the digital audio data.

As it happens, compressing a typical CD track with something like the Deflate algorithm in Zip doesn't actually give many space savings. The reason is the data stream exhibits randomness: the two-byte accuracy of the sampling means that even similar pieces of music encode slightly differently, negating the benefits of dictionary compression algorithms such as Deflate. Random data doesn't compress, so CD tracks don't compress terribly well at all.

Trimming the edges

The next solution is to use a lossy compression scheme. Such a scheme essentially throws away unimportant data in order to make the result more compressible. On decompressing the data, the algorithm doesn't produce exactly the same output as the original input, but we don't notice the difference.

This kind of algorithm is therefore only of any real use for things such as images, video, and audio. For images and photos, the archetypal lossy compression algorithm is the JPG file format. The vast majority of lower-end digital cameras produce JPG images as a matter of course.

The reasons are primarily to do with smaller file sizes: more photos can be stored on the camera's internal flash storage, and transferring photos to a computer takes less time. Of course, the fact that the vast majority of digital photos are only viewed on a computer screen (sometimes as thumbnails more often than full size) and never digitally manipulated that much means that JPGs are more than sufficient.

High-end DSLRs and professional cameras use a RAW format, which, although it may be compressed, isn't lossy-compressed. We don't usually notice that JPG is a lossy compression format because the algorithm only discards information that the human eye would have difficultly perceiving when viewed alongside other parts of the photo.

With audio compression, we take advantage of the imperfect nature of the human ear to help us identify (and discard) unimportant parts of the music: there are frequencies we can't hear, there are frequencies we distinguish better than others, and when two sounds play at the same time, we hear the louder sound rather than the softer one.

Why use MP3?

The MP3 algorithm uses these details to remove those sounds we can't hear (or have difficulty in perceiving among the rest of the audio) to simplify the data stream to make it more compressible. The idea is to tweak things so that the removed data does not hurt the quality of the audio for the eventual listener.

Nevertheless, to make things plain, MP3 cannot produce CD quality audio since it eliminates information from the data stream; instead we call the result near-CD quality or even FM quality. However, the compression ratio we obtain is truly remarkable: three minute MP3 tracks are typically between 3MB and 5MB in size – about an order of magnitude smaller than the original CD track.

The MP3 algorithm has a single tuning knob that enables us to determine how much information is thrown away. Some people will be fine with increasing the lossy part of the compression algorithm because they only listen to MP3s in a noisy environment – in a car or a busy office, for example. Within a noisy environment, you won't hear the most subtle sounds, so it makes sense to optimise for file size rather than audio quality.

If you're listening to music in a quieter environment, such as at home in your living room, you may be more aware of the loss of quality and not so bothered by file size. The lossy algorithm tuning knob is known as the bit rate.

Bit rates are measured in bits per second; MP3 varies from 96kbps to 320kbps. At the low end of the scale, 96kbps or 128kbps is equivalent to FM radio. At the high end of the scale – say 256kbps to 320kbps – the sound quality is comparable to that of a CD.

The speed of sound

Remember that a CD delivers data at a rate of 176KB/s, or 1,400kbps. This means that a song saved at the 96kbps bit rate is roughly 1/14 of the size of a CD track. At the 256kbps bit rate, files are about a fifth of the size.

So, for example, if your car's CD player can play MP3 CDs (that is, data CDs containing MP3s), you'll be able to put five times as many 256kbps bit rate MP3 tracks on the CD as you could on a standard audio CD.

The burning question then is: which bit rate do you go for when you want the best sound you can get for the smallest file size? The only subjective variable here is quality: what I might deem as acceptable quality, you might cringe at, or vice versa.

Various experiments have been conducted and it's been discovered that, in general, people can't tell the difference between an audio track encoded as a 256kbps MP3 and one from a CD. The only significant statistic is that if you know a particular track very well from CD, you're more likely to spot an MP3-encoded version of it than if you're listening to a track you've never heard before.

The MP3 file format was designed to contain more than just the lossy-compressed audio data. The file consists of a set of MP3 frames, each comprising a header and corresponding data.

A set of frames may be enclosed inside a tag to indicate that the frames are describing something special, such as metadata about the MP3 track (the artist's name, title, album, track number, musical genre, album art and so on).

Add your own data

Although the MP3 standard doesn't define its own standards for these metadata tags, there are two that have grown into standards through being recognised by several audio players. These are the ID3v1 and ID3v2 tags – although there's also a new one called APEv2, which is gaining familiarity and approval.

On playback, the metadata tags are generally read by the audio player, so that relevant information can be displayed for the user. Although many CD rippers create metadata for your tracks and embed them in the MP3 files (and programs such as iTunes enable you to edit the metadata for your tracks), there are MP3 tag editors that allow you to manipulate the metadata at a finer level, or in block mode.

All in all, MP3s have changed the music environment for good. Although you can still buy CDs – or if you're really old-school, vinyl – most people consume their music through MP3 or AAC.

Online retailers such as Amazon and iTunes help you buy MP3s for immediate download and gratification. Online radio stations, including Pandora and Spotify, enable you to listen to lossy-compressed streamed tracks without the need for purchase.

Programs like iTunes and Windows Media Player enable you to rip your CDs as MP3s onto your hard disk for later listening. Audio players such as the iPod and Zune let you to listen to your MP3-encoded music wherever you want to. In short, MP3s are here to stay.