Module 2: Aggregation & Grouping•30 min

Combining Aggregation Techniques

Progress Tracking

Real analytics rarely involves just one technique. A typical request is: “Show me how many people were hired per department in the last two years, but only for departments with more than 3 hires, sorted by count.” That’s filtering, grouping, aggregating, filtering groups, and sorting — all in one query. This lesson is about chaining those pieces together into a coherent pipeline.

Building Analyses Incrementally

Here’s a business question: Find departments with more than 2 employees hired after 2022, showing their headcount and average salary, sorted by average salary.

Don’t write it all at once. Build it piece by piece.

Step 1: Headcount by Department

Python

dept_counts = (
    techcorp_workforce
    .groupby("department")["id"]
    .count()
    .reset_index(name="emp_count")
)
dept_counts

Step 2: Only Hires After 2022

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Tables: techcorp_workforce

Step 3: Departments with More Than 2 Hires

Tables: techcorp_workforce

Step 4: Add Salary Stats and Sort

Tables: techcorp_workforce

Build Incrementally

Always build analyses step by step. Run after each change. It’s much easier to debug one new line than a 20-line chain that doesn’t work.

Top 10 Songs 2010

Table: billboard_top_100_year_end

year	year_rank	group_name	artist	song_name	id
1956	1	Elvis Presley	Elvis Presley	Heartbreak Hotel	1
1956	2	Elvis Presley	Elvis Presley	Don't Be Cruel	2
1956	3	Nelson Riddle	Nelson Riddle	Lisbon Antigua	3
1956	4	Platters	Platters	My Prayer	4
1956	5	Gogi Grant	Gogi Grant	The Wayward Wind	5

Tables: billboard_top_100_year_end

Key Takeaways

You now have a complete toolkit for summarizing and analyzing data:

.sum(), .mean(), .min(), .max(), .count() for basic aggregation
.groupby() to segment your data
Boolean indexing on grouped results to filter groups
Understanding of how NaN behaves in aggregates
Boolean masks and .loc[] for conditional aggregation
The ability to combine all these techniques incrementally

What’s Next

So far, all your analyses have worked with a single DataFrame. But real datasets have data spread across multiple tables: customers in one, orders in another, products in a third. Module 3 introduces merging and joining — combining data from multiple DataFrames. That’s when things get really powerful.

Next upContinue →

Understanding DataFrame Relationships

Module 3: Combining DataFrames

10 min