Module 2: Aggregation & Grouping•20 min

Filtering Groups

Progress Tracking

The Problem

You know how to filter rows before grouping — that’s just boolean indexing before .groupby(). But what if you want to filter based on the aggregated result? For example: only departments with more than 5 employees, or only months with revenue over $100K.

You can’t do this with a regular filter before grouping, because the count doesn’t exist until after you group. You need to filter the grouped result.

Filtering After Grouping

Python

order_counts = (
    orders.groupby("cust_id")["id"]
    .count()
    .reset_index(name="order_count")
)
order_counts[order_counts["order_count"] >= 3]

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Tables: techcorp_workforce

That's it. Group, aggregate into a new DataFrame, then filter it like any other DataFrame. No special syntax needed.

This Is Pandas' HAVING

In SQL, you'd use HAVING COUNT(*) > 5 to filter groups. In pandas, you just filter the grouped result with boolean indexing — the same technique you already know.

This distinction matters:

Before grouping — filter individual rows (e.g., only employees hired after 2022)
After grouping — filter groups by their aggregated values (e.g., only departments with 5+ people)

You can combine both:

Tables: techcorp_workforce

Multiple Conditions on Groups

Tables: techcorp_workforce

Departments With 5 Employees

Table: employee

id	first_name	last_name	age	sex	employee_title	department	salary	target	bonus	email	city	address	manager_id
5	Max	George	26	M	Sales	Sales	1300	200	150	Max@company.com	California	2638 Richards Avenue	1
13	Katty	Bond	56	F	Manager	Management	150000	0	300	Katty@company.com	Arizona		1
11	Richerd	Gear	57	M	Manager	Management	250000	0	300	Richerd@company.com	Alabama		1
10	Jennifer	Dion	34	F	Sales	Sales	1000	200	150	Jennifer@company.com	Alabama		13
19	George	Joe	50	M	Manager	Management	100000	0	300	George@company.com	Florida	1003 Wyatt Street	1

Tables: employee

Key Takeaways

Filter before grouping to narrow which rows enter the groups.
Filter after grouping to keep only groups meeting a threshold.
No special syntax — just boolean indexing on the grouped result.
Combine both: filter rows → group → aggregate → filter groups.

What’s Next

Aggregations silently skip NaN values. That’s usually fine, but sometimes it produces misleading numbers. The next lesson covers how NaN behaves in aggregations and how to handle it.

Next upContinue →

NaN in Aggregations

15 min