*37*

In statistics,Â **correlationÂ **refers to the strength and direction of a relationship between two variables. The value of a correlation coefficient can range from -1 to 1, with -1 indicating a perfect negative relationship, 0 indicating no relationship, and 1 indicating a perfect positive relationship.

There are three common ways to measure correlation:

**Pearson Correlation:Â **Used to measure the correlation between two continuous variables. (e.g. height and weight)

**Spearman Correlation:Â **Used to measure the correlation between two ranked variables. (e.g. rank of a studentâ€™s math exam score vs. rank of their science exam score in a class)

**Kendallâ€™s Correlation:Â **Used when you wish to use Spearman Correlation but the sample size is small and there are many tied ranks.

This tutorial explains how to find all three types of correlations in Stata.

**Loading the Data**

For each of the following examples we will useÂ a dataset calledÂ *auto*.Â You can load this dataset by typing the following into the Command box:

use http://www.stata-press.com/data/r13/auto

We can get a quick look at the dataset by typing the following into the Command box:

summarize

We can see that there are 12 total variables in the dataset.

**How to Find Pearson Correlation in Stata**

We can find the Pearson Correlation Coefficient between the variablesÂ *weightÂ *and *length* by using theÂ **pwcorrÂ **command:

pwcorr weight length

The Pearson Correlation coefficient between these two variables is **0.9460**. To determine if this correlation coefficient is significant, we can find the p-value by using theÂ **sigÂ **command:

pwcorr weight length, sig

The p-value isÂ **0.000**. Since this is less than 0.05, the correlation between these two variables is statistically significant.

To find the Pearson Correlation Coefficient for multiple variables, simply type in a list of variables after theÂ **pwcorrÂ **command:

pwcorr weight length displacement, sig

Here is how to interpret the output:

- Pearson Correlation between weight and length = 0.9460 | p-value = 0.000
- Pearson Correlation between weight and displacement = 0.8949 | p-value = 0.000
- Pearson Correlation between displacement and length = 0.8351 | p-value = 0.000

**How to Find Spearman Correlation in Stata**

We can find the Spearman Correlation Coefficient between the variablesÂ *trunkÂ *and *rep78Â *by using theÂ **spearmanÂ **command:

spearman trunk rep78

Here is how to interpret the output:

**Number of obs:Â**This is the number of pairwise observations used to calculate the Spearman Correlation Coefficient. Because there were some missing values for the variableÂ*rep78*, Stata used only 69 (rather than the full 74) pairwise observations.**Spearmanâ€™s rho:Â**This is the Spearman correlation coefficient. In this case, itâ€™s -0.2235, indicating there is a negative correlation between the two variables. As one increases, the other tends to decrease.**Prob > |t|:Â**This is the p-value associated with the hypothesis test. In this case, the p-value is 0.0649, which indicates there is not a statistically significant correlation between the two variables atÂ Î± = 0.05.

We can find the Spearman Correlation Coefficient for multiple variables by simply typing more variables after theÂ **spearmanÂ **command. We can find the correlation coefficient and the corresponding p-value for each pairwise correlation by using the **stats(rho p)Â **command:

spearman trunk rep78 gear_ratio, stats(rho p)

Here is how to interpret the output:

- Spearman Correlation between trunk and rep78 = -0.2235 | p-value = 0.0649
- Spearman Correlation between trunk and gear_ratio = -0.5187 | p-value = 0.0000
- Spearman Correlation between gear_ratio and rep78 = 0.4275 | p-value = 0.0002

**How to Find Kendallâ€™s Correlation in Stata**

We can find Kendallâ€™s Correlation Coefficient between the variablesÂ *trunkÂ *and *rep78Â *by using theÂ **ktauÂ **command:

ktau trunk rep78

Here is how to interpret the output:

**Number of obs:Â**This is the number of pairwise observations used to calculate Kendallâ€™s Correlation Coefficient. Because there were some missing values for the variableÂ*rep78*, Stata used only 69 (rather than the full 74) pairwise observations.**Kendallâ€™s tau-b:Â**This is Kendallâ€™s correlation coefficient between the two variables. We typically use this value instead of tau-a because tau-b makes adjustments for ties. In this case, tau-b = -0.1752, indicating a negative correlation between the two variables.**Prob > |z|:Â**This is the p-value associated with the hypothesis test. In this case, the p-value is 0.0662, which indicates there is not a statistically significant correlation between the two variables atÂ Î± = 0.05.

We can find Kendallâ€™s Correlation Coefficient for multiple variables by simply typing more variables after theÂ **ktauÂ **command. We can find the correlation coefficient and the corresponding p-value for each pairwise correlation by using theÂ **stats(taub p)Â **command:

ktau trunk rep78 gear_ratio, stats(taub p)

- Kendallâ€™s Correlation between trunk and rep78 = -0.1752 | p-value = 0.0662
- Kendallâ€™s Correlation between trunk and gear_ratio = -0.3753 | p-value = 0.0000
- Kendallâ€™s Correlation between gear_ratio and rep78 = 0.3206 | p-value = 0.0006