Combining Aggregation Techniques
Progress Tracking
Log in to save this lesson and continue from where you left off.
Real analytics rarely involves just one technique. A typical request is: “Show me how many people were hired per department in the last two years, but only for departments with more than 3 hires, sorted by count.” That’s filtering, grouping, aggregating, filtering groups, and sorting — all in one query. This lesson is about chaining those pieces together into a coherent pipeline.
Building Analyses Incrementally
Here’s a business question: Find departments with more than 2 employees hired after 2022, showing their headcount and average salary, sorted by average salary.
Don’t write it all at once. Build it piece by piece.
Step 1: Headcount by Department
dept_counts = (
techcorp_workforce
.groupby("department")["id"]
.count()
.reset_index(name="emp_count")
)
dept_countsStep 2: Only Hires After 2022
Add a filter to keep only employees who joined after 2022-01-01 before grouping.
Step 3: Departments with More Than 2 Hires
Add a filter on the grouped result to keep only departments with more than 2 employees.
Step 4: Add Salary Stats and Sort
Replace the simple count with full salary stats (count, mean, min, max) and sort by average salary descending.
Always build analyses step by step. Run after each change. It’s much easier to debug one new line than a 20-line chain that doesn’t work.
Top 10 Songs 2010
| year | year_rank | group_name | artist | song_name | id |
|---|---|---|---|---|---|
| 1956 | 1 | Elvis Presley | Elvis Presley | Heartbreak Hotel | 1 |
| 1956 | 2 | Elvis Presley | Elvis Presley | Don't Be Cruel | 2 |
| 1956 | 3 | Nelson Riddle | Nelson Riddle | Lisbon Antigua | 3 |
| 1956 | 4 | Platters | Platters | My Prayer | 4 |
| 1956 | 5 | Gogi Grant | Gogi Grant | The Wayward Wind | 5 |
Find the top 10 ranked songs in 2010. Output the rank, group name, and song name, but do not show the same song twice. Sort the result based on the rank in ascending order.
Key Takeaways
You now have a complete toolkit for summarizing and analyzing data:
.sum(),.mean(),.min(),.max(),.count()for basic aggregation.groupby()to segment your data- Boolean indexing on grouped results to filter groups
- Understanding of how
NaNbehaves in aggregates - Boolean masks and
.loc[]for conditional aggregation - The ability to combine all these techniques incrementally
What’s Next
So far, all your analyses have worked with a single DataFrame. But real datasets have data spread across multiple tables: customers in one, orders in another, products in a third. Module 3 introduces merging and joining — combining data from multiple DataFrames. That’s when things get really powerful.