Home Â» Pandas: How to Group and Aggregate by Multiple Columns

# Pandas: How to Group and Aggregate by Multiple Columns

Often you may want to group and aggregate by multiple columns of a pandas DataFrame.

Fortunately this is easy to do using the pandas .groupby() and .agg() functions.

This tutorial explains several examples of how to use these functions in practice.

### Example 1: Group by Two Columns and Find Average

Suppose we have the following pandas DataFrame:

```import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'B', 'B', 'B', 'M', 'M', 'M'],
'position': ['G', 'G', 'F', 'G', 'F', 'F', 'C', 'C'],
'assists': [5, 7, 7, 8, 5, 7, 6, 9],
'rebounds': [11, 8, 10, 6, 6, 9, 6, 10]})

#view DataFrame
print(df)

team position  assists  rebounds
0    A        G        5        11
1    B        G        7         8
2    B        F        7        10
3    B        G        8         6
4    B        F        5         6
5    M        F        7         9
6    M        C        6         6
7    M        C        9        10
```

The following code shows how to group by columns â€˜teamâ€™ and â€˜positionâ€™ and find the mean assists:

```df.groupby(['team', 'position']).agg({'assists': ['mean']}).reset_index()

team	position  assists
mean
0	A	G	  5.0
1	B	F	  6.0
2	B	G	  7.5
3	M	C	  7.5
4	M	F	  7.0
```

The output tells us:

• The mean assists for players in position G on team A isÂ 5.0.
• The mean assists for players in position F on team B is 6.0.
• The mean assists for players in position G on team B is 7.5.

And so on.

We can also use the following code to rename the columns in the resulting DataFrame:

```#group by team and position and find mean assists
new = df.groupby(['team', 'position']).agg({'assists': ['mean']}).reset_index()

#rename columns
new.columns = ['team', 'pos', 'mean_assists']

#view DataFrame
print(new)

team	pos	mean_assists
0	A	G	5.0
1	B	F	6.0
2	B	G	7.5
3	M	C	7.5
4	M	F	7.0```

### Example 2: Group by Two Columns and Find Multiple Stats

Assume we use the same pandas DataFrame as the previous example:

```import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'B', 'B', 'B', 'M', 'M', 'M'],
'position': ['G', 'G', 'F', 'G', 'F', 'F', 'C', 'C'],
'assists': [5, 7, 7, 8, 5, 7, 6, 9],
'rebounds': [11, 8, 10, 6, 6, 9, 6, 10]})```

The following code shows how to find the median and max number of rebounds, grouped on columns â€˜teamâ€™ and â€˜positionâ€™:

```df.groupby(['team', 'position']).agg({'rebounds': ['median', 'max']}).reset_index()

team	position	rebounds
median	max
0	A	G	 11	11
1	B	F	 8	10
2	B	G	 7	8
3	M	C	 8	10
4	M	F	 9	9
```

The output tells us:

• The median rebounds assists for players in position G on team A is 11.
• The max rebounds for players in position G on team A is 11.
• The median rebounds for players in position F on team B is 8.
• The max rebounds for players in position F on team B is 10.

And so on.