Home Â» How to Obtain Predicted Values and Residuals in Stata

# How to Obtain Predicted Values and Residuals in Stata

Linear regression is a method we can use to understand the relationship between one or more explanatory variables and a response variable.

When we perform linear regression on a dataset, we end up with a regression equation which can be used to predict the values of a response variable, given the values for the explanatory variables.

We can then measure the difference between the predicted values and the actual values to come up with theÂ residuals for each prediction. This helps us get an idea of how well our regression model is able to predict the response values.

This tutorial explains how to obtain both theÂ predicted valuesÂ and theÂ residualsÂ for a regression model in Stata.

### Example: How to Obtain Predicted Values and Residuals

For this example we will use the built-in Stata dataset calledÂ auto. Weâ€™ll use mpgÂ andÂ displacementÂ as the explanatory variables and price as the response variable.

Use the following steps to perform linear regression and subsequently obtain the predicted values and residuals for the regression model.

Step 1: Load and view the data.

First, weâ€™ll load the data using the following command:

sysuse auto

Next, weâ€™ll get a quick summary of the data using the following command:

summarize

Step 2: Fit the regression model.

Next, weâ€™ll use the following command to fit the regression model:

regress price mpg displacement

The estimated regression equation is as follows:

estimated price = 6672.766 -121.1833*(mpg) + 10.50885*(displacement)

Step 3: Obtain the predicted values.

We can obtain the predicted values by using theÂ predictÂ command and storing these values in a variable named whatever weâ€™d like. In this case, weâ€™ll use the nameÂ pred_price:

predict pred_price

We can view the actual prices and the predicted prices side-by-side using theÂ listÂ command. There are 74 total predicted values, but weâ€™ll view just the first 10 by using theÂ in 1/10Â command:

list price pred_price in 1/10

Step 4: Obtain the residuals.

We can obtain the residuals of each prediction by using theÂ residualsÂ command and storing these values in a variable named whatever weâ€™d like. In this case, weâ€™ll use the nameÂ resid_price:

predict resid_price, residuals

We can view the actual price, the predicted price, and the residuals all side-by-side using theÂ listÂ command again:

list price pred_price resid_price in 1/10

Step 5: Create a predicted values vs. residuals plot.

Lastly, we can created a scatterplot to visualize the relationship between the predicted values and the residuals:

scatterÂ resid_price pred_price

We can see that, on average, the residuals tend to grow larger as the fitted values grow larger. This could be a sign of heteroscedasticity â€“ when the spread of the residuals is not constant at every response level.

We could formally test forÂ heteroscedasticity using the Breusch-Pagan Test and we could address this problem using robust standard errors.