Principal Component Analysis (PCA) is a popular dimensionality reduction technique and the maths behind it is very elegant. However, it was difficult to find worked examples that clearly demonstrated when using PCA is helpful during data analysis. It is often used to visualize datasets by projecting features onto 2 or 3 dimensional space. However, I also wanted to understand if using PCA before fitting a linear model could lead to better results. I conducted experiments with toy datasets and a small real dataset to address this question. This post contains the results. It is intended to explain what PCA is and to explore when it is and is not useful for data analysis.
Section one explains the mathematics of PCA. PCA is the process of transforming a dataset, X, into a new dataset Y. The section starts with the desirable properties of the transformed dataset, Y, and works through the mathematics which guarantee these properties. It is intended to be understood by a reader who has a basic understanding of linear algebra, but can be skipped if readers wish only to see the application of PCA to datasets.
Section two uses toy datasets to demonstrate what happens to the principal components and the accuracy of a simple principal component regression when the variance of features change.
Section three explores the trade off between dimensionality reduction using PCA and the performance of a linear model. It compares the performance of linear regression, ridge regression and principal component regression in predicting the median household income of US counties. The objective of this section is to show how the accuracy of principal component regression changes as the number of principal components is reduced, and to assess how effective PCA is in preventing overfitting when compared to ridge regression.
The full post (pdf), appendix (pdf), and link to the MATLAB code is below. If you have any feedback please email: contact [at] learningmachinelearning [dot] org