Cloudimage - Documentation

Digital Images and the JPEG compression method?

JPEG is an abbreviation which stands for Joint Photographic Experts Group – the organisation which has created the specification of a compression method designed specifically for digital images.

The file format of the .jpg/.jpeg images we know is either a JFIF or an EXIF file – both file formats are container formats which include images compressed via the JPEG compression method.

In this article, we’ll be talking about the JPEG compression method for digital images.

Digital images

First, some background. Digital images can be defined as “electronic snapshots taken of a scene or scanned from documents, such as photographs, manuscripts, printed texts, and artwork”, as described by the Cornell University, a leading research body in the field of digital image processing. Leaving aside the content, a digital image can be technically described as a matrix of dots (or pixels) with different brightness.

Almost all image formats which are supported by the web browsers are 8-bit images. This means that we have 28 = 256 possible values for each pixel. This can be described as a number between 0 and 255, 0 being black and 255 – white.

The image above is a so-called grayscale image. It has only one channel, and each pixel value is effectively the brightness of the respective pixel.

To add colours, the simplest way is to use three separate channels for the three base electronic display colours – red, green and blue. When these three channels are combined, we have a colour image.

To store such colour image, we can treat each channel as a separate grayscale image. In this case, we have an image in the RGB colour space:

Red channel

Green channel

Blue channel

Storing colour images in the RGB colour space can take a lot of space – we are basically storing three separate images. We can use some very clever tricks to decrease the file size.

Trick 1 - the YCbCr colour space

The JPEG compression method can store images in many different colour spaces. The most widely used one however is YCbCr.

In the human eye (I’m oversimplifying here), different photoreceptor cells are responsible for the intensity of the light and for its colour. We can use this to our advantage by saying that humans are more sensitive to brightness than to colour.

In the YCbCr colour space, the image is defined by three components – Y is luma, Cb is chroma-blue and Cr is chroma-red.

Luma channel (intensity)

Chroma-blue channel ("blueness")

Chroma-red channel ("redness")

In the human eye (I’m oversimplifying a bit), different photoreceptor cells are responsible for the brightness of the light and for its colour. We can use this to our advantage by saying that humans are more sensitive to brightness than to colour.

In the YCbCr colour space, the image is defined by three components – Y is luma, Cb is chroma-blue and Cr is chroma-red.

The luma channel basically stores the image without the colour information while the Cb and Cr channels stores the colour information for the blue and red colours, respectively.

This gives us the opportunity to compress aggressively the colour channels (and losing some information there) without compromising our perception of the image.

For example, the chroma subsampling is an often used mechanism for image size reduction in JPEG compression. Basically, we can reduce the resolution of the colour channels with 50% in one or both direction without much image deterioration.

Without chroma subsampling

With chroma subsampling

If you zoom the image, you can see where information is lost - image colours are slightly blurred which can be seen clearly on sharp edges. Intensity (respectively, brightness) information however remains intact which helps.

Without chroma subsampling

With chroma subsampling

On the images above, at a high zoom level, image quality loss is noticeable. With a well-designed website however such image zoom levels should not be possible. To zoom an image without losing quality, a technique called responsive images should be used instead. For each zoom level, you can have a full-resolution image with good quality. We will cover this vast area with separate article in our blog.

Trick 2 – frequency perception

Mathematically, an image can be also described as a continuous signal. Let us consider two differnet 1-dimentional grayscale images for simplicity – images with height of 1 pixel and width of 8 pixels. If we plot the intensity (brightness) of these images on a chart, we can see this signal. The value of each pixel's intensity is 0 for black and 255 for white.

A high-frequency image

A predominantly low-frequency image

If we increase the brightness of just two pixels in each image by 10, our awareness of the change is considerably higher for the low-frequency image than for the high-frequency one:

High-frequency image: no noticeable difference

Low-frequency image: difference easily noticeable

To see how we can take advantage of this, we need to go a bit deeper into the JPEG compression algorithm and the mathematics behind it.

Adding weighted signals

Let us leave aside our example for a moment and consider two simple signals:

If we want to create a more complex signal, we can just add the two signals to one another.

The first one, we multiply by 0.5, and the second one leave as it is (multiply by 1). These multipliers, we'll cal weights - the weight of the first function is 0.5 and the weight of the second one is 1.

Let's keep this mechanism in mind and continue with the JPEG algorithm.

Discrete Cosine Transform (DCT)

The “heart” of the JPEG compression algorithm is the so-called Discrete Cosine Transform. It’s a fairly complex mathematical operation which can thankfully be explained quite simply via an example.

We’ll use the same 8x1-pixel grayscale image for this example. Again, each pixel can be described by its intensity – 0 is black and 255 is white and we can draw a curve with the intensities:

This signal can be approximated by weighing and adding the following 8 functions:

These functions are called discrete cosine transform basis functions. And we'll use these functions to approximate all possible 8x1 images. We just need to find the weights for each functions. Leaving out the calculations, we have figured out that the weights for our examples will be correlated to the following coefficients:

Just a quick visualisation:

(animated gif of the addition of the functions)

The next step is the crucial one in JPEG compression.

Quantisation

The process of quantisation is the most important part of the JPEG compression method and this is the step where we lose most information (remember, the JPEG is the so-called lossy compression).

Instead of storing the eight DCT coefficients from above, we can divide them by certain numbers which in essence define the quality of the compression. If you set the JPEG quality to 100%, you say that the quantisation numbers are all 1s. This means that virtually no data is lost at this step. Let’s consider the following quantisation matrix (in the 1-d case, a quantisation vector):

And after division, we get:

These are the numbers we will store for our image, along with the quantisation matrix. When we need to decompress the image (to display it), we revers the steps in reverse order to get the intensity values of each pixel of our image.