*55*

In statistics, we often use theÂ Pearson correlation coefficientÂ to measure the linear relationship between two variables.Â However, sometimes weâ€™re interested in understanding the relationship between two variablesÂ **while controlling for a third variable**.

For example, suppose we want to measure the association between the number of hours a student studies and the final exam score they receive, while controlling for the studentâ€™s current grade in the class. In this case, we could use aÂ **partial correlationÂ **to measure the relationship between hours studied and final exam score.

This tutorial explains how to calculate partial correlation in Python.

**Example: Partial Correlation in Python**

Suppose we have the following Pandas DataFrame that displays the current grade, total hours studied, and final exam score for 10 students:

import numpy as np import panda as pd data = {'currentGrade': [82, 88, 75, 74, 93, 97, 83, 90, 90, 80], 'hours': [4, 3, 6, 5, 4, 5, 8, 7, 4, 6], 'examScore': [88, 85, 76, 70, 92, 94, 89, 85, 90, 93], } df = pd.DataFrame(data, columns = ['currentGrade','hours', 'examScore']) df currentGrade hours examScore 0 82 4 88 1 88 3 85 2 75 6 76 3 74 5 70 4 93 4 92 5 97 5 94 6 83 8 89 7 90 7 85 8 90 4 90 9 80 6 93

To calculate the partial correlation betweenÂ **hoursÂ **andÂ **examScore** while controlling forÂ **currentGrade**, we can use theÂ **partial_corr()**Â function from theÂ pingouin package, which uses the following syntax:

**partial_corr(data, x, y, covar)**

where:

**data:**name of the dataframe**x, y:**names of columns in the dataframe**covar:**the name of the covariate column in the dataframe (e.g. the variable youâ€™re controlling for)

Here is how to use this function in this particular example:

#install and import pingouin package pip install pingouin import pingouin as pg #find partial correlation between hours and exam score while controlling for grade pg.partial_corr(data=df, x='hours', y='examScore', covar='currentGrade') n r CI95% r2 adj_r2 p-val BF10 power pearson 10 0.191 [-0.5, 0.73] 0.036 -0.238 0.598 0.438 0.082

We can see that the partial correlation between hours studied and final exam score isÂ **.191**, which is a small positive correlation. As hours studied increases, exam score tends to increase as well, assuming current grade is held constant.

To calculate the partial correlation between multiple variables at once, we can use theÂ **.pcorr()Â **function:

#calculate all pairwise partial correlations, rounded to three decimal places df.pcorr().round(3) currentGrade hours examScore currentGrade 1.000 -0.311 0.736 hours -0.311 1.000 0.191 examScore 0.736 0.191 1.000

The way to interpret the output is as follows:

- The partial correlation between current grade and hours studied isÂ
**-0.311**. - The partial correlation between current grade and exam score
**0.736**. - The partial correlation between hours studied and exam scoreÂ
**0.191**.