*30*

The **Rand index** is a way to compare the similarity of results between two different clustering methods.

Often denoted *R*, the Rand Index is calculated as:

*R* = (a+b) / (_{n}C_{2})

where:

**a:**The number of times a pair of elements belongs to the same cluster across two clustering methods.**b:**The number of times a pair of elements belong to difference clusters across two clustering methods.The number of unordered pairs in a set of_{n}C_{2}:*n*elements.

The Rand index always takes on a value between 0 and 1 where:

**0:**Indicates that two clustering methods do not agree on the clustering of any pair of elements.**1:**Indicates that two clustering methods perfectly agree on the clustering of every pair of elements.

The following example illustrates how to calculate the Rand index between two clustering methods for a simple dataset.

**Example: How to Calculate the Rand Index**

Suppose we have the following dataset of five elements:

- Dataset: {A, B, C, D, E}

And suppose we use two clustering methods that place each element in the following clusters:

- Method 1 Clusters: {1, 1, 1, 2, 2}
- Method 2 Clusters: {1, 1, 2, 2, 3}

To calculate the Rand index between these clustering methods, we need to first write out every possible unordered pair in the dataset of five elements:

- Unordered pairs: {A, B}, {A, C}, {A, D}, {A, E}, {B, C}, {B, D}, {B, E}, {C, D}, {C, E}, {D, E}

There are **10** unordered pairs.

Next, we need to calculateÂ **a**, which represents the number of unordered pairs that belong to the same cluster across both clustering methods:

- {A, B}

In this case, a = **1**.

Next, we need to calculateÂ **b**, which represents the number of unordered pairs that belong to different clusters across both clustering methods:

- {A, D}, {A, E}, {B, D}, {B, E}, {C, E}

In this case, b = **5**.

Lastly, we can calculate the Rand index as:

*R*= (a+b) / (_{n}C_{2})*R*= (1+5) / 10*R*= 6/10

The Rand index is **0.6**.

**How to Calculate the Rand Index in R**

We can use theÂ **rand.index()** function from theÂ **fossil** package to calculate the Rand index between two clustering methods in R:

library(fossil) #define clusters method1 #calculate Rand index between clustering methodsrand.index(method1, method2) [1] 0.6

The Rand index is **0.6**. This matches the value that we calculated by hand.

**How to Calculate the Rand Index in Python**

We can define the following function in Python to calculate the Rand index between two clusters:

import numpy as np from scipy.special import comb #define Rand index function def rand_index(actual, pred): tp_plus_fp = comb(np.bincount(actual), 2).sum() tp_plus_fn = comb(np.bincount(pred), 2).sum() A = np.c_[(actual, pred)] tp = sum(comb(np.bincount(A[A[:, 0] == i, 1]), 2).sum() for i in set(actual)) fp = tp_plus_fp - tp fn = tp_plus_fn - tp tn = comb(len(A), 2) - tp - fp - fn return (tp + tn) / (tp + fp + fn + tn) #calculate Rand index rand_index([1, 1, 1, 2, 2], [1, 1, 2, 2, 3]) 0.6

The Rand index turns out to be **0.6**. This matches the value calculated in the previous examples.

**Additional Resources**

An Introduction to K-Means Clustering

An Introduction to K-Medoids Clustering

An Introduction to Hierarchical Clustering