*72*

One of the main assumptions of linear regression is that the residuals are normally distributed.

One way to visually check this assumption is to create a histogram of the residuals and observe whether or not the distribution follows a “bell-shape” reminiscent of the normal distribution.

This tutorial provides a step-by-step example of how to create a histogram of residuals for a regression model in R.

**Step 1: Create the Data**

First, let’s create some fake data to work with:

#make this example reproducible set.seed(0) #create data x1 #view first six rows of data head(data) x1 x2 y 1 3.262954 6.3455776 -1.1371530 2 1.673767 1.6696701 -0.6886338 3 3.329799 2.1520303 5.8081615 4 3.272429 4.1397409 3.7815228 5 2.414641 0.6088427 4.3269030 6 0.460050 5.7301563 6.6721111

**Step 2: Fit the Regression Model**

Next, we’ll fit a multiple linear regression model to the data:

#fit multiple linear regression model model

**Step 3: Create a Histogram of Residuals**

Lastly, we’ll use the **ggplot** visualization package to create a histogram of the residuals from the model:

#load ggplot2 library(ggplot2) #create histogram of residuals ggplot(data = data, aes(x = model$residuals)) + geom_histogram(fill = 'steelblue', color = 'black') + labs(title = 'Histogram of Residuals', x = 'Residuals', y = 'Frequency')

Note that we can also specify the number of bins to place the residuals in by using the **bin** argument.

The fewer the bins, the wider the bars will be in the histogram. For example, we could specify **20 bins**:

#create histogram of residuals ggplot(data = data, aes(x = model$residuals)) + geom_histogram(bins = 20, fill = 'steelblue', color = 'black') + labs(title = 'Histogram of Residuals', x = 'Residuals', y = 'Frequency')

Or we could specify **10 bins**:

#create histogram of residuals ggplot(data = data, aes(x = model$residuals)) + geom_histogram(bins = 10, fill = 'steelblue', color = 'black') + labs(title = 'Histogram of Residuals', x = 'Residuals', y = 'Frequency')

No matter how many bins we specify, we can see that the residuals are roughly normally distributed.

We could also perform a formal statistical test like the Shapiro-Wilk, Kolmogorov-Smirnov, or Jarque-Bera to test for normality.

However, keep in mind that these tests are sensitive to large sample sizes – that is, they often conclude that the residuals are not normal when the sample size is large.

For this reason, it’s often easier to assess normality by creating a histogram of the residuals.