*54*

The n^{th} **percentile** of a dataset is the value that cuts off the first *n* percent of the data values when all of the values are sorted from least to greatest.

For example, the 90th percentile of a dataset is the value that cuts of the bottom 90% of the data values from the top 10% of data values.

We can quickly calculate percentiles in Python by using the numpy.percentile() function, which uses the following syntax:

**numpy.percentile(a, q)**

where:

**a:**Array of values**q:**Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive.

This tutorial explains how to use this function to calculate percentiles in Python.

**How to Find Percentiles of an Array**

The following code illustrates how to find various percentiles for a given array in Python:

import numpy as np #make this example reproducible np.random.seed(0) #create array of 100 random integers distributed between 0 and 500 data = np.random.randint(0, 500, 100) #find the 37th percentile of the array np.percentile(data, 37) 173.26 #Find the quartiles (25th, 50th, and 75th percentiles) of the array np.percentile(data, [25, 50, 75]) array([116.5, 243.5, 371.5])

**How to Find Percentiles of a DataFrame Column**

The following code shows how to find the 95th percentile value for a single pandas DataFrame column:

import numpy as np import pandas as pd #create DataFrame df = pd.DataFrame({'var1': [25, 12, 15, 14, 19, 23, 25, 29, 33, 35], 'var2': [5, 7, 7, 9, 12, 9, 9, 4, 14, 15], 'var3': [11, 8, 10, 6, 6, 5, 9, 12, 13, 16]}) #find 90th percentile of var1 column np.percentile(df.var1, 95) 34.1

**How to Find Percentiles of Several DataFrame Columns**

The following code shows how to find the 95th percentile value for a several columns in a pandas DataFrame:

import numpy as np import pandas as pd #create DataFrame df = pd.DataFrame({'var1': [25, 12, 15, 14, 19, 23, 25, 29, 33, 35], 'var2': [5, 7, 7, 9, 12, 9, 9, 4, 14, 15], 'var3': [11, 8, 10, 6, 6, 5, 9, 12, 13, 16]}) #find 95th percentile of each column df.quantile(.95) var1 34.10 var2 14.55 var3 14.65 #find 95th percentile of just columns var1 and var2 df[['var1', 'var2']].quantile(.95) var1 34.10 var2 14.55

Note that we were able to use the pandas quantile() function in the examples above to calculate percentiles.