Filtering Groups
Progress Tracking
Log in to save this lesson and continue from where you left off.
The Problem
You know how to filter rows before grouping — that’s just boolean indexing before .groupby(). But what if you want to filter based on the aggregated result? For example: only departments with more than 5 employees, or only months with revenue over $100K.
You can’t do this with a regular filter before grouping, because the count doesn’t exist until after you group. You need to filter the grouped result.
Filtering After Grouping
order_counts = (
orders.groupby("cust_id")["id"]
.count()
.reset_index(name="order_count")
)
order_counts[order_counts["order_count"] >= 3]| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Group `techcorp_workforce` by department, count employees, and keep only departments with 3 or more.
That's it. Group, aggregate into a new DataFrame, then filter it like any other DataFrame. No special syntax needed.
In SQL, you'd use HAVING COUNT(*) > 5 to filter groups. In pandas, you just filter the grouped result with boolean indexing — the same technique you already know.
This distinction matters:
- Before grouping — filter individual rows (e.g., only employees hired after 2022)
- After grouping — filter groups by their aggregated values (e.g., only departments with 5+ people)
You can combine both:
Filter to recent hires first, then find departments with more than 2 of those hires.
Multiple Conditions on Groups
Find departments with more than 2 employees and an average salary over 75,000.
Departments With 5 Employees
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
Find departments with at more than or equal 5 employees.
Key Takeaways
- Filter before grouping to narrow which rows enter the groups.
- Filter after grouping to keep only groups meeting a threshold.
- No special syntax — just boolean indexing on the grouped result.
- Combine both: filter rows → group → aggregate → filter groups.
What’s Next
Aggregations silently skip NaN values. That’s usually fine, but sometimes it produces misleading numbers. The next lesson covers how NaN behaves in aggregations and how to handle it.