Module 2: Aggregation & Grouping•35 min

Grouping Data

Progress Tracking

One Number Per Group

In the last lesson, you got a single total for the whole DataFrame. Useful, but limited. Real business questions need breakdowns: revenue by month, customers by region, headcount by department.

.groupby() is how you get there. It splits your data into groups, then applies an aggregate method to each group separately.

Your First .groupby()

Python

(
    orders.groupby("cust_id")["total_order_cost"]
    .sum()
    .reset_index(name="total_spent")
)

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Tables: techcorp_workforce

Instead of one count for everyone, you get one count per department. Pandas groups all HR employees, counts them, then does the same for Admin, Account, and so on.

.reset_index() — Why You Need It

After .groupby().count(), the group column (department) becomes the DataFrame's index instead of a regular column. The index is like a row label — useful internally, but awkward for further analysis. .reset_index() moves it back to a normal column. The name= parameter renames the aggregated column: .reset_index(name="employee_count") turns the unnamed count into a descriptive column name. You'll chain .reset_index() after almost every groupby.

Every .groupby() follows the same three-step pattern:

Split: .groupby("column") — which column to group by
Aggregate: .sum(), .mean(), .count(), etc. — what calculation to apply
Reset: .reset_index() — clean up the output

Using .agg() for Multiple Aggregations

To compute several statistics per group, use .agg():

Tables: techcorp_workforce

Sorting Grouped Results

.groupby() results come back in no particular order. Sort them:

Tables: techcorp_workforce

MacBookPro User Event Count

Group by event name and count.

Table: playbook_events

user_id	occurred_at	event_type	event_name	location	device
6991	2014-06-09 18:26:54	engagement	home_page	United States	iphone 5
18851	2014-08-29 13:18:38	signup_flow	enter_info	Russia	asus chromebook
14998	2014-07-01 12:47:56	engagement	login	France	hp pavilion desktop
8186	2014-05-23 10:44:16	engagement	home_page	Italy	macbook pro
9626	2014-07-31 17:15:14	engagement	login	Russia	nexus 7

Tables: playbook_events

Artist Appearance Count

Group by artist and count appearances.

Table: spotify_worldwide_daily_song_ranking

id	position	trackname	artist	streams	url	stream_date	region
303651	52	Heart Won't Forget	Matoma	28047	https://open.spotify.com/track/2of2DM5LqTh7ohmmVXUKsH	2017-02-04	no
85559	160	Someone In The Crowd - From "La La Land" Soundtrack	Emma Stone	17134	https://open.spotify.com/track/7xE4vKvjqUTtHyJ9zi0k1q	2017-02-26	fr
1046089	175	The Greatest	Sia	10060	https://open.spotify.com/track/7xHWNBFm6ObGEQPaUxHuKO	2017-03-06	cl
350824	25	Unforgettable	French Montana	46603	https://open.spotify.com/track/3B54sVLJ402zGa6Xm4YGNe	2017-10-01	no
776822	1	Bad and Boujee (feat. Lil Uzi Vert)	Migos	1823391	https://open.spotify.com/track/4Km5HrUvYTaSUfiSGPJeQR	2017-01-27	us

Tables: spotify_worldwide_daily_song_ranking

Top Ranked Songs

Filter first, then group.

Table: spotify_worldwide_daily_song_ranking

id	position	trackname	artist	streams	url	stream_date	region
303651	52	Heart Won't Forget	Matoma	28047	https://open.spotify.com/track/2of2DM5LqTh7ohmmVXUKsH	2017-02-04	no
85559	160	Someone In The Crowd - From "La La Land" Soundtrack	Emma Stone	17134	https://open.spotify.com/track/7xE4vKvjqUTtHyJ9zi0k1q	2017-02-26	fr
1046089	175	The Greatest	Sia	10060	https://open.spotify.com/track/7xHWNBFm6ObGEQPaUxHuKO	2017-03-06	cl
350824	25	Unforgettable	French Montana	46603	https://open.spotify.com/track/3B54sVLJ402zGa6Xm4YGNe	2017-10-01	no
776822	1	Bad and Boujee (feat. Lil Uzi Vert)	Migos	1823391	https://open.spotify.com/track/4Km5HrUvYTaSUfiSGPJeQR	2017-01-27	us

Tables: spotify_worldwide_daily_song_ranking

Key Takeaways

.groupby() splits data into groups before aggregating.
Pattern: df.groupby("col")["value"].method().reset_index().
Use .agg([...]) for multiple aggregations per group.
Always chain .reset_index() for clean output.
Sort with .sort_values() after grouping.

What’s Next

.groupby() creates your groups, but what if you only want some of them? Say, only departments with more than five employees. You can’t filter before grouping for this — you need to filter after. That’s next.

Next upContinue →

Filtering Groups

20 min