Ever wonder how various graphics software are able to reduce the file size of your image without a significant loss in quality? Welcome to the world of image compression! Expanding on a previous post in which I used principal component analysis (PCA) to generate so-called “eigenfaces”, I will be using the infamous Lenna image to demonstrate how the same technique can be used to compress images and reduce file size.

Before we dive into the demonstration, however, let’s briefly go over PCA in the context of color images. As most of you know, the smallest representation of a digital image on a display is called a “pixel.” PNG color images, like the ones we will be using, are typically comprised of pixels in RGB space. Here is a visual representation of one RGB pixel for more clarity:

So each pixel has three dimensions associated with it - red, green, and blue.

Now let’s say we have an PNG file that has the dimensions 512 x 512. The representation of this square image in RGB color space is therefore 512 x 512 x 3, a 3-D array. This may be hard to imagine, so for the purposes of interpretation, let’s split this image into its individual color spaces and focus only on the red dimension, a 512 x 512 matrix.

Now that we are working with a matrix, we can do some fun statistics. Grossly summarizing the algorithm, we can use PCA to find new directions in our red color space that describe the most information (maximizing the variance of our data). We can then project our red pixel values onto this new space, being sure to choose a smaller number of dimensions without resulting in too much information loss.

We could then extend PCA to both the green and blue color spaces and stitch back individual 512 x 512 matrices into the original 3-D array.

So what does this look like? How many principal components must we use to get a similar image quality to the original while still reducing the overall file size? Check out the gallery below:

Interesting! But did we reduce the file size at all?

Awesome! We can even see how the first few principal components captures most of the information (variability) and how the reduction in file size for subsequent components decreases - just as PCA says it should!

As always, I encourage you to try it for yourself. Here’s some R code so you can go wild: