Module 2: Aggregation & Grouping30 min

Combining Aggregation Techniques

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Real analytics rarely involves just one technique. A typical request is: “Show me how many people were hired per department in the last two years, but only for departments with more than 3 hires, sorted by count.” That’s filtering, grouping, aggregating, filtering groups, and sorting — all in one query. This lesson is about chaining those pieces together into a coherent pipeline.

Building Analyses Incrementally

Here’s a business question: Find departments with more than 2 employees hired after 2022, showing their headcount and average salary, sorted by average salary.

Don’t write it all at once. Build it piece by piece.

Step 1: Headcount by Department

Python
dept_counts = (
    techcorp_workforce
    .groupby("department")["id"]
    .count()
    .reset_index(name="emp_count")
)
dept_counts

Step 2: Only Hires After 2022

1
Filter to Recent Hires

Add a filter to keep only employees who joined after 2022-01-01 before grouping.

Tables: techcorp_workforce

Step 3: Departments with More Than 2 Hires

2
Add Group Filter

Add a filter on the grouped result to keep only departments with more than 2 employees.

Tables: techcorp_workforce

Step 4: Add Salary Stats and Sort

3
Complete the Analysis

Replace the simple count with full salary stats (count, mean, min, max) and sort by average salary descending.

Tables: techcorp_workforce
Build Incrementally

Always build analyses step by step. Run after each change. It’s much easier to debug one new line than a 20-line chain that doesn’t work.

Top 10 Songs 2010

Table: billboard_top_100_year_end
yearyear_rankgroup_nameartistsong_nameid
19561Elvis PresleyElvis PresleyHeartbreak Hotel1
19562Elvis PresleyElvis PresleyDon't Be Cruel2
19563Nelson RiddleNelson RiddleLisbon Antigua3
19564PlattersPlattersMy Prayer4
19565Gogi GrantGogi GrantThe Wayward Wind5
4
Top 10 Songs 2010
View solution

Find the top 10 ranked songs in 2010. Output the rank, group name, and song name, but do not show the same song twice. Sort the result based on the rank in ascending order.

Tables: billboard_top_100_year_end

Key Takeaways

You now have a complete toolkit for summarizing and analyzing data:

  • .sum(), .mean(), .min(), .max(), .count() for basic aggregation
  • .groupby() to segment your data
  • Boolean indexing on grouped results to filter groups
  • Understanding of how NaN behaves in aggregates
  • Boolean masks and .loc[] for conditional aggregation
  • The ability to combine all these techniques incrementally

What’s Next

So far, all your analyses have worked with a single DataFrame. But real datasets have data spread across multiple tables: customers in one, orders in another, products in a third. Module 3 introduces merging and joining — combining data from multiple DataFrames. That’s when things get really powerful.