An Introduction to Partial Least Squares

by Tutor Aspire February 17, 2016

One of the most common problems that you’ll encounter in machine learning is multicollinearity. This occurs when two or more predictor variables in a dataset are highly correlated.

When this occurs, a model may be able to fit a training dataset well but it may perform poorly on a new dataset it has never seen because it overfits the training set.

One way to get around the problem of multicollinearity is to use principal components regression, which calculates M linear combinations (known as “principal components”) of the original p predictor variables and then uses the method of least squares to fit a linear regression model using the principal components as predictors.

The drawback of principal components regression (PCR) is that it does not consider the response variable when calculating the principal components.

Instead, it only considers the magnitude of the variance among the predictor variables captured by the principal components. Because of this, it’s possible that in some cases the principal components with the largest variances aren’t actually able to predict the response variable well.

A technique that is related to PCR is known as partial least squares. Similar to PCR, partial least squares calculates M linear combinations (known as “PLS components”) of the original p predictor variables and uses the method of least squares to fit a linear regression model using the PLS components as predictors.

But unlike PCR, partial least squares attempts to find linear combinations that explain the variation in both the response variable and the predictor variables.

Steps to Perform Partial Least Squares

In practice, the following steps are used to perform partial least squares.

1. Standardize the data such that all of the predictor variables and the response variable have a mean of 0 and a standard deviation of 1. This ensures that each variable is measured on the same scale.

2. Calculate Z₁, … , Z_M to be the M linear combinations of the original p predictors.

Z_m = ΣΦ_jmX_j for some constants Φ_1m, Φ_2m, Φ_pm, m = 1, …, M.
To calculate Z₁, set Φ_j1equal to the coefficient from the simple linear regression of Y onto X_jis the linear combination of the predictors that captures the most variance possible.
To calculate Z₂, regression each variable on Z₁ and take the residuals. Then calculate Z₂ using this orthogonalized data in exactly the same manner that Z₁ was calculated.
Repeat this process M times to obtain the M PLS components.

3. Use the method of least squares to fit a linear regression model using the PLS components Z₁, … , Z_M as predictors.

4. Lastly, use k-fold cross-validation to find the optimal number of PLS components to keep in the model. The “optimal” number of PLS components to keep is typically the number that produces the lowest test mean-squared error (MSE).

Conclusion

In cases where multicollinearity is present in a dataset, partial least squares tends to perform better than ordinary least squares regression. However, it’s a good idea to fit several different models so that we can identify the one that generalizes best to unseen data.

In practice, we fit many different types of models (PLS, PCR, Ridge, Lasso, Multiple Linear Regression, etc.) to a dataset and use k-fold cross-validation to identify the model that produces the lowest test MSE on new data.

An Introduction to Partial Least Squares

Steps to Perform Partial Least Squares

Conclusion

An Easy Guide to K-Fold Cross-Validation

How to Normalize Data Between 0 and 100

You may also like