*89*

The **Shapiro-Wilk test **is a test of normality. It is used to determine whether or not a sample comes from a normal distribution.

This type of test is useful for determining whether or not a given dataset comes from a normal distribution, which is a common assumption used in many statistical tests including regression, ANOVA, t-tests, and many others.

We can easily perform a Shapiro-Wilk test on a given dataset using the following built-in function in R:

**shapiro.test(x)**

where:

**x:**A numeric vector of data values.

This function produces a test statistic *W *along with a corresponding p-value. If the p-value is less than α =.05, there is sufficient evidence to say that the sample does not come from a population that is normally distributed.

**Note:** The sample size must be between 3 and 5,000 in order to use the shapiro.test() function.

This tutorial shows several examples of how to use this function in practice.

**Example 1: Shapiro-Wilk Test on Normal Data**

The following code shows how to perform a Shapiro-Wilk test on a dataset with sample size n=100:

#make this example reproducible set.seed(0) #create dataset of 100 random values generated from a normal distribution data #perform Shapiro-Wilk test for normality shapiro.test(data) Shapiro-Wilk normality test data: data W = 0.98957, p-value = 0.6303

The p-value of the test turns out to be **0.6303**. Since this value is not less than .05, we can assume the sample data comes from a population that is normally distributed.

This result shouldn’t be surprising since we generated the sample data using the rnorm() function, which generates random values from a normal distribution with mean = 0 and standard deviation = 1.

**Related: **A Guide to dnorm, pnorm, qnorm, and rnorm in R

We can also produce a histogram to visually verify that the sample data is normally distributed:

hist(data, col='steelblue')

We can see that the distribution is fairly bell-shaped with one peak in the center of the distribution, which is typical of data that is normally distributed.

**Example 2: Shapiro-Wilk Test on Non-Normal Data**

The following code shows how to perform a Shapiro-Wilk test on a dataset with sample size n=100 in which the values are randomly generated from a Poisson distribution:

#make this example reproducible set.seed(0) #create dataset of 100 random values generated from a Poisson distribution data #perform Shapiro-Wilk test for normality shapiro.test(data) Shapiro-Wilk normality test data: data W = 0.94397, p-value = 0.0003393

The p-value of the test turns out to be **0.0003393**. Since this value is less than .05, we have sufficient evidence to say that the sample data does *not *come from a population that is normally distributed.

This result shouldn’t be surprising since we generated the sample data using the rpois() function, which generates random values from a Poisson distribution.

**Related: **A Guide to dpois, ppois, qpois, and rpois in R

We can also produce a histogram to visually see that the sample data is not normally distributed:

hist(data, col='coral2')

We can see that the distribution is right-skewed and doesn’t have the typical “bell-shape” associated with a normal distribution. Thus, our histogram matches the results of the Shapiro-Wilk test and confirms that our sample data does not come from a normal distribution.

**What to Do with Non-Normal Data**

If a given dataset is *not* normally distributed, we can often perform one of the following transformations to make it more normal:

**1. Log Transformation: **Transform the response variable from y to **log(y)**.

**2. Square Root Transformation: **Transform the response variable from y to **√y**.

**3. Cube Root Transformation: **Transform the response variable from y to **y ^{1/3}**.

By performing these transformations, the response variable typically becomes closer to normally distributed.

Check out this tutorial to see how to perform these transformations in practice.

**Additional Resources**

How to Conduct an Anderson-Darling Test in R

How to Conduct a Kolmogorov-Smirnov Test in R

How to Perform a Shapiro-Wilk Test in Python