How does JPEG work?
JPEG is a lossy compression format conceived explicitly for making photo files smaller and it exploits the imperfect characteristics of our perception. JPEG files are more correctly described as being in JFIF (JPEG File Interchange Format), which is a limited expression of the full JPEG standard.
JPEG stands for the Joint Photographic Experts Group, a committee set up in 1986 by the CCITT standards body with the remit to "establish a standard for the sequential progressive encoding of continuous tone greyscale and colour images". This it did by 1992, and the standard was ratified in 1994.
The basic workflow to create a JPEG file from a bitmap is as follows. First, the bitmap image is converted from RGB to another colour space known as YCbCr. This colour space, which is also used by TV signals, encodes colour in a different way from RGB (although it covers the same colours).
Two of the replacement components, Cb and Cr, are highly compressible. The Y component – the luma – is a value that indicates how bright the pixel is. The Cb and Cr components – the chroma – are the blue and red differences.
Positive discrimination
To make this conversion process easy, there's a set of equations to get the luma and chroma values for a single RGB pixel. Rather than compute these various multiplications at run-time, the results can be pre-computed and stored in tables for performance enhancements.
Get daily insight, inspiration and deals in your inbox
Sign up for breaking news, reviews, opinion, top tech deals, and more.
The human eye is able to discriminate the brightness of an image much more finely than its colour information (indeed, at low levels of light we actually see in black and white, since the illumination is too dim to stimulate the cone cells in the retina). This means that the luma value needs much higher fidelity than the two chroma components do. The JPEG format exploits the eye's imbalance by downsampling the chroma values.
Downsampling is simply the process of reducing the chroma values by some factor (and therefore is the first step in losing information). In the JPEG format, there are three accepted possibilities: no downsampling at all, dividing the chroma values horizontally by two, or dividing the chroma values both horizontally or vertically by two.
The next step is to split the downsampled pixels in the image into 8 x 8 blocks. Each colour component is split up separately, and each component sample goes through the same process in what follows. Note that on many occasions, the size of the image will not be a simple multiple of eight pixels in either direction. This can result in some pixel artefacts being created along the right and bottom sides of a JPEG picture.
The next step is fun, but puzzling. Each 8 x 8 block is converted into another matrix using a Discrete Cosine Transform (DCT). This transform, which is similar to a Fourier transform, analyses the frequencies of the original values along each row and column using a set of cosine waves oscillating at different frequencies and amplitudes. The reason for doing this is that the higher frequencies can be minimized or zeroed out since we do not perceive their loss as acutely as the more energetic lower frequencies.
The interesting thing about this transform is that the value with the biggest amplitude of the matrix is found at the top-left cell (known as the DC coefficient) and the values get smaller the further away from that point they get (all 63 other values are known as the AC coefficients). Generally, we'll need more bits to represent the values in this transformed matrix than can be held in a byte (which is what we've been using up to now).
The matrix reloaded
This converted matrix is then quantised. This is the main lossy part of the algorithm and the stage where we minimise the higher frequencies over the lower frequencies. One major result of this quantisation is that many higher DCT coefficients are zeroed out, making them extremely compressible in the next step.
The quantisation is accomplished by a set of 8 x 8 matrices, each one representing a different 'quality factor' for the JPEG image. Each cell is divided by the corresponding cell in the quantisation matrix and the result rounded (another lossy operation). Note that this does not involve matrix multiplication in the mathematical sense of the phrase.
Finally, the resulting quantised matrix is encoded using Huffman compression. To make the most use of the way the values in the matrix seem to radiate out from the top-left corner, the values are encoded not across each row for all rows but in a zig-zag pattern. This means that the zero cells tend to appear at the end of the zig-zag chain and therefore can be ruthlessly compressed (in fact, there's a special code that indicates that all remaining cell values are zero in the 8 x 8 block).
Huffman encoding is a lossless compression algorithm. Remember, for each 8 x 8 block of pixels in the original image (192 bytes of information), you will end up with three compressed 8 x 8 quantised matrices, with the Cr and Cb matrices being the most compressed.
After all that, how do you decompress a JPEG image to a raster bitmap to display on a screen? Well, pretty obviously, you should perform all of these steps in reverse order. First of all, you have to decode the Huffman compressed 8 x 8 block. This gives you the quantised matrix. Now you can multiply the quantised matrix by the relevant quantisation matrix to give the matrix of DCT coefficients. This is then transformed by the inverse DCT to give the original component matrix in the YCbCr colour space.
For each set of three component matrices, we can convert them to the RGB colour space using the inverse colour transformation equations. The image will now be decompressed and ready to display.
-------------------------------------------------------------------------------------------------------
First published in PC Plus, issue 278