*38*

In statistics, weâ€™re often interested in understanding the relationship between two variables. For example, we might want to understand the relationship between the number of hours a student studies and the exam score they receive.

One way to quantify this relationship is to use theÂ Pearson correlation coefficient, which isÂ a measure of the linear association between two variables*.Â *It has a value between -1 and 1 where:

- -1 indicates a perfectly negative linear correlation between two variables
- 0 indicates no linear correlation between two variables
- 1 indicates a perfectly positive linear correlation between two variables

The further away the correlation coefficient is from zero, the stronger the relationship between the two variables.

But in some cases we want to understand the correlation between more than just one pair of variables. In these cases, we can create aÂ **correlation matrix**, which is a square table that shows the the correlation coefficients between several pairwise combination of variables.Â

In this tutorial we explain how to create a correlation matrix in Stata.

**How to Create a Correlation Matrix in Stata**

The commandÂ **corrÂ **can be used to produce a correlation matrix for a particular dataset in Stata.

To illustrate this, letâ€™s load the 1980 census data into Stata by typing the following into the command box:

use http://www.stata-press.com/data/r13/census13

We can then get a quick summary of the dataset by typing the followingÂ into the command box:

summarize

This produces the following table:

We see that the dataset contains nine different variables. To create a correlation matrix for every pairwise combination of variables in the dataset, we can type the following into the command box:

corr

This produces the following correlation matrix:

The numbers shown in the table represent the Pearson Correlation Coefficients for each pairwise combination of variables. For example, the correlation between *pop* and *state* isÂ **-0.0540**. This indicates that these two variables are slightly negatively correlated.

Notice that the correlation along the diagonals of the table are each 1.0000, since each variable is perfectly correlated with itself.

You can also create a correlation matrix for only a certain subset of variables in a dataset by specifying the variables after theÂ **corrÂ **command. For example, here is how to create a correlation matrix for just the variablesÂ *pop*,Â *medage*, andÂ *region*:

corr pop medage region

This produces the following correlation matrix for just these three variables:

Itâ€™s also possible to put a star next to the correlation coefficients that are statistically significant at a certain significance level by using theÂ **pwcorrÂ **command (which produces the same result asÂ **corr**) along with theÂ **star()Â **command.

For example, the following code produces a correlation matrix for every variable in the census dataset and places a star next to the correlation coefficients that are statistically significant atÂ Î± = 0.05:

pwcorr, star(.05)

Notice how several of the correlation coefficients in the table are statistically significant atÂ Â Î± = 0.05. We could setÂ Î± to be any number weâ€™d like, but common choices are .01, .05, and .10.

In general, the lower we set the value ofÂ Â Î±, the fewer correlation coefficients will be statistically significant. For example, suppose we setÂ Â Î± = 0.01.Â

pwcorr, star(.01)

Notice how fewer correlation coefficients have a star next to them.