*45*

**Simple linear regressionÂ **isÂ a method you can use to understand the relationship between an explanatory variable, x, and a response variable, y.

This tutorial explains how to perform simple linear regression in Stata.

**Example: Simple Linear Regression in Stata**

Suppose we are interested in understanding the relationship between the weight of a car and its miles per gallon.Â To explore this relationship, we can perform simple linear regression using weight as an explanatory variable and miles per gallon as a response variable.

Perform the following steps in Stata to conduct a simple linear regression using the dataset calledÂ *auto*, which contains data on 74 different cars.

**Step 1: Load the data.**

Load the data by typing the following into the Command box:

use http://www.stata-press.com/data/r13/auto

**Step 2: Get a summary of the data.**

Gain a quick understanding of the data youâ€™re working with byÂ typing the following into the Command box:

summarize

We can see that there are 12 different variables in the dataset, but the only two that we care about areÂ *mpgÂ *andÂ *weight*.

**Step 3: Visualize the data.**

Before we perform simple linear regression, letâ€™s first create a scatterplot of weight vs. mpg so we can visualize the relationship between these two variables and check for any obvious outliers. Type the following into the Command box to create a scatterplot:

scatter mpg weight

This produces the following scatterplot:

We can see that cars with higher weights tend to have lower miles per gallon. To quantify this relationship, we will now perform a simple linear regression.

**Step 4: Perform simple linear regression.**

Type the following into the Command box to perform a simple linear regression using weight as an explanatory variable and mpg as a response variable.

regress mpg weight

Here is how to interpret the most interesting numbers in the output:

**R-squared:**Â 0.6515.Â This is the proportion of the variance in the response variable that can be explained by the explanatory variable. In this example, 65.15% of the variation in mpg can be explained by weight.

**Coef (weight):Â **-0.006. This tells us the average change in the response variable associated with a one unit increase in the explanatory variable. In this example, each one pound increase in weight is associated with a decrease of 0.006 in mpg, on average.

**Coef (_cons):Â **39.44028. This tells us the average value of the response variable when the explanatory variable is zero. In this example, the average mpg is 39.44028 when the weight of a car is zero. This doesnâ€™t actually make much sense to interpret since the weight of a car canâ€™t be zero, but the number 39.44028 is needed to form a regression equation.

**P>|t| (weight):Â **0.000. This is the p-value associated with the test statistic for weight. In this case, since this value is less than 0.05, we can conclude that there is a statistically significant relationship between weight and mpg.

**Regression Equation:Â **Lastly, we can form a regression equation using the two coefficient values. In this case, the equation would be:

predicted mpg =Â 39.44028 â€“ 0.0060087*(weight)

We can use this equation to find the predicted mpg for a car, given its weight. For example, a car that weighs 4,000 pounds is predicted to have mpg of 15.405:

predicted mpg =Â 39.44028 â€“ 0.0060087*(4000) = 15.405

**Step 5: Report the results.**

Lastly, we want to report the results of our simple linear regression. Here is an example of how to do so:

A linear regression was performed to quantify the relationship between the weight of a car and its miles per gallon. A sample of 74 cars was used in the analysis.

Â

Results showed that there was a statistically significant relationship between weight and mpg (t = -11.60, p

Â

The regression equation was found to be:

Â

predicted mpg =Â 39.44 â€“ 0.006(weight)

Â

Each additional pound was associated with a decrease, on average, of -.006 miles per gallon.