*82*

In statistics, a **z-score **tells us how many standard deviations away a value is from the mean. We use the following formula to calculate a z-score:

**z** = (X – μ) / σ

where:

- X is a single raw data value
- μ is the population mean
- σ is the population standard deviation

This tutorial explains how to calculate z-scores for raw data values in Python.

**How to Calculate Z-Scores in Python**

We can calculate z-scores in Python using **scipy.stats.zscore**, which uses the following syntax:

**scipy.stats.zscore(a, axis=0, ddof=0, nan_policy=’propagate’)**

where:

**a**: an array like object containing data**axis**: the axis along which to calculate the z-scores. Default is 0.**ddof**: degrees of freedom correction in the calculation of the standard deviation. Default is 0.**nan_policy**: how to handle when input contains nan. Default is propagate, which returns nan. ‘raise’ throws an error and ‘omit’ performs calculations ignoring nan values.

The following examples illustrate how to use this function to calculate z-scores for one-dimensional numpy arrays, multi-dimensional numpy arrays, and Pandas DataFrames.

**Numpy One-Dimensional Arrays**

**Step 1: Import modules.**

import pandas as pd import numpy as np import scipy.stats as stats

**Step 2: Create an array of values.**

data = np.array([6, 7, 7, 12, 13, 13, 15, 16, 19, 22])

**Step 3: Calculate the z-scores for each value in the array.**

stats.zscore(data) [-1.394, -1.195, -1.195, -0.199, 0, 0, 0.398, 0.598, 1.195, 1.793]

Each z-score tells us how many standard deviations away an individual value is from the mean. For example:

- The first value of “6” in the array is
**1.394**standard deviations*below*the mean. - The fifth value of “13” in the array is
**0**standard deviations away from the mean, i.e. it is equal to the mean. - The last value of “22” in the array is
**1.793**standard deviations*above*the mean.

**Numpy Multi-Dimensional Arrays**

If we have a multi-dimensional array, we can use the **axis **parameter to specify that we want to calculate each z-score relative to its own array. For example, suppose we have the following multi-dimensional array:

data = np.array([[5, 6, 7, 7, 8], [8, 8, 8, 9, 9], [2, 2, 4, 4, 5]])

We can use the following syntax to calculate the z-scores for each array:

stats.zscore(data, axis=1) [[-1.569 -0.588 0.392 0.392 1.373] [-0.816 -0.816 -0.816 1.225 1.225] [-1.167 -1.167 0.5 0.5 1.333]]

The z-scores for each individual value are shown relative to the array they’re in. For example:

- The first value of “5” in the first array is
**1.159**standard deviations*below*the mean of its array. - The first value of “8” in the second array is
**.816**standard deviations*below*the mean of its array. - The first value of “2” in the third array is
**1.167**standard deviations*below*the mean of its array.

**Pandas DataFrames**

Suppose we instead have a Pandas DataFrame:

data = pd.DataFrame(np.random.randint(0, 10, size=(5, 3)), columns=['A', 'B', 'C']) data A B C 0 8 0 9 1 4 0 7 2 9 6 8 3 1 8 1 4 8 0 8

We can use the **apply **function to calculate the z-score of individual values by column:

data.apply(stats.zscore) A B C 0 0.659380 -0.802955 0.836080 1 -0.659380 -0.802955 0.139347 2 0.989071 0.917663 0.487713 3 -1.648451 1.491202 -1.950852 4 0.659380 -0.802955 0.487713

The z-scores for each individual value are shown relative to the column they’re in. For example:

- The first value of “8” in the first column is
**0.659**standard deviations*above*the mean value of its column. - The first value of “0” in the second column is
**.803**standard deviations*below*the mean value of its column. - The first value of “9” in the third column is
**.836**standard deviations*above*the mean value of its column.

**Additional Resources:**

How to Calculate Z-Scores in Excel

How to Calculate Z-Scores in SPSS

How to Calculate Z-Scores on a TI-84 Calculator