*52*

The Jaccard similarity index measures the similarity between two sets of data. It can range from 0 to 1. The higher the number, the more similar the two sets of data.

The Jaccard similarity index is calculated as:

**Jaccard Similarity** = (number of observations in both sets) / (number in either set)

Or, written in notation form:

**J(A, B) =Â **|Aâˆ©B| / |AâˆªB|

This tutorial explains how to calculate Jaccard Similarity for two sets of data in Python.

**Example: Jaccard Similarity in Python**

Suppose we have the following two sets of data:

import numpy as np a = [0, 1, 2, 5, 6, 8, 9] b = [0, 2, 3, 4, 5, 7, 9]

We can define the following function to calculate the Jaccard Similarity between the two sets:

#define Jaccard Similarity function def jaccard(list1, list2): intersection = len(list(set(list1).intersection(list2))) union = (len(list1) + len(list2)) - intersection return float(intersection) / union #find Jaccard Similarity between the two sets jaccard(a, b) 0.4

The Jaccard Similarity between the two lists isÂ **0.4**.

Note that the function will returnÂ **0Â **if the two sets donâ€™t share any values:

c = [0, 1, 2, 3, 4, 5] d = [6, 7, 8, 9, 10] jaccard(c, d) 0.0

And the function will returnÂ **1Â **if the two sets are identical:

e = [0, 1, 2, 3, 4, 5] f = [0, 1, 2, 3, 4, 5] jaccard(e, f) 1.0

The function also works for sets that contain strings:

g = ['cat', 'dog', 'hippo', 'monkey'] h = ['monkey', 'rhino', 'ostrich', 'salmon'] jaccard(g, h) 0.142857

You can also use this function to find the **Jaccard distanceÂ **between two sets, which is theÂ *dissimilarity* between two sets and is calculated as 1 â€“ Jaccard Similarity.

a = [0, 1, 2, 5, 6, 8, 9] b = [0, 2, 3, 4, 5, 7, 9] #find Jaccard distance between setsaandb1 - jaccard(a, b) 0.6

**Related:Â **How to Calculate Jaccard Similarity in R

*Refer to this Wikipedia page to learn more details about the Jaccard Similarity Index.*