Module 6: Window Operations10 min

Introduction to Window Operations

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

The Frustrating Limitation

Your manager asks for a report showing each employee’s salary alongside their department’s average. You know how to get individual salaries and department averages separately — but combining them?

Window Functions Keep Every Row

You saw .transform() in Module 4 for per-group calculations. Here we’ll build on it as the foundation for all window operations — ranking, shifting, and running totals.

Python
orders["cust_avg"] = (
    orders.groupby("cust_id")["total_order_cost"]
    .transform("mean")
)
orders[["cust_id", "total_order_cost", "cust_avg"]]
Table: ms_employee_salary
idfirst_namelast_namesalarydepartment_id
1ToddWilson1100001006
1ToddWilson1061191006
2JustinSimon1289221005
2JustinSimon1300001005
3KellyRosario426891002
1
Add a Department Average Column

Add a department average salary column using .transform(). Show first name, department, salary, and the average.

Tables: ms_employee_salary

Adding Comparisons

The simplest and most common use: "How far is this employee from their department average?" This pattern appears in every performance review analysis, every sales benchmarking report, and every anomaly detection pipeline. Compute the group stat, subtract from the individual value, and you have your comparison.

2
Compare to Department Average

The starter adds the department average. Add a column showing how far each salary is from the department average.

Tables: ms_employee_salary
When to Use Which

“Total revenue by region?” → .groupby().sum(). “Show each sale with its region’s total?” → .transform(). Ask yourself: collapse rows, or keep them all?

Finding Updated Records

Table: ms_employee_salary
idfirst_namelast_namesalarydepartment_id
1ToddWilson1100001006
1ToddWilson1061191006
2JustinSimon1289221005
2JustinSimon1300001005
3KellyRosario426891002
3
Finding Updated Records
View solution

We have a table with employees and their salaries; however, some of the records are old and contain outdated salary information. Since there is no timestamp, assume salary is non-decreasing over time. You can consider the current salary for an employee is the largest salary value among their records. If multiple records share the same maximum salary, return any one of them. Output their id, first name, last name, department ID, and current salary. Order your list by employee ID in ascending order.

Tables: ms_employee_salary

Key Takeaways

  • .transform() calculates across groups without collapsing — the pandas equivalent of SQL window functions.
  • Every original row stays; the aggregated value is broadcast to each row in the group.
  • Use .groupby().agg() when you want summaries; .transform() when you want detail + context.

What’s Next

Now let’s look at ranking — .rank() with different methods gives you ROW_NUMBER, RANK, and DENSE_RANK equivalents.