*70*

MulticollinearityÂ inÂ regression analysisÂ occurs when two or more explanatory variables are highly correlated to each other, such that they do not provide unique or independent information in the regression model. If the degree of correlation is high enough between variables, it can cause problems when fitting and interpreting the regression model.Â

Fortunately, itâ€™s possible to detect multicollinearity using a metric known as the**Â variance inflation factor (VIF)**, which measures the correlation and strength of correlation between the explanatory variables in a regression model.

This tutorial explains how to calculate VIF in Excel.

**Example: Calculating VIF in Excel**

For this example we will perform a multiple linear regression using the following dataset that describes the attributes of 10 basketball players. We will fit a regression model using rating as the response variable and points, assists, and rebounds as the explanatory variables. Then, weâ€™ll identify the VIF values for each explanatory variable.

**Step 1: Perform a multiple linear regression.**

Along the top ribbon, go to the Data tab and click on Data Analysis.Â If you donâ€™t see this option, then you need to firstÂ install the free Analysis ToolPak.

Once you click on Data Analysis, a new window will pop up. SelectÂ *RegressionÂ *and click OK.

Fill in the necessary arrays for the response variables and the explanatory variables, then click OK.

This produces the following output:

**Step 2: Calculate the VIF for each explanatory variable.**

Next, we can calculate the VIF for each of the three explanatory variables by performing individual regressions using one explanatory variable as the response variable and the other two as the explanatory variables.

For example, we can calculate the VIF for the variableÂ *pointsÂ *by performing a multiple linear regression usingÂ *pointsÂ *as the response variable andÂ *assistsÂ *andÂ *reboundsÂ *as the explanatory variables.

This produces the following output:

The VIF forÂ *pointsÂ *is calculated as 1 / (1 â€“ R Square) = 1 / (1 â€“ .433099) =Â **1.76**.

We can then repeat this process for the other two variablesÂ *assistsÂ *andÂ *rebounds*.

It turns out that the VIF for the three explanatory variables are as follows:

points: **1.76**

assists: **1.96**

rebounds: **1.18**

**How to Interpret VIF Values**

The value for VIF starts at 1 and has no upper limit. A general rule of thumb for interpreting VIFs is as follows:

- A value of 1 indicates there is no correlation between a given explanatory variable and any other explanatory variables in the model.
- A value between 1 and 5 indicates moderate correlation between a given explanatory variable and other explanatory variables in the model, but this is often not severe enough to require attention.
- A value greater than 5 indicates potentially severe correlation between a given explanatory variable and other explanatory variables in the model. In this case, the coefficient estimates and p-values in the regression output are likely unreliable.

Given that each of the VIF values for the explanatory variables in our regression model are close to 1, multicollinearity is not a problem in our example.