Module 6: Window Operations•10 min

Introduction to Window Operations

Progress Tracking

The Frustrating Limitation

Your manager asks for a report showing each employee’s salary alongside their department’s average. You know how to get individual salaries and department averages separately — but combining them?

Window Functions Keep Every Row

You saw .transform() in Module 4 for per-group calculations. Here we’ll build on it as the foundation for all window operations — ranking, shifting, and running totals.

Python

orders["cust_avg"] = (
    orders.groupby("cust_id")["total_order_cost"]
    .transform("mean")
)
orders[["cust_id", "total_order_cost", "cust_avg"]]

Table: ms_employee_salary

id	first_name	last_name	salary	department_id
1	Todd	Wilson	110000	1006
1	Todd	Wilson	106119	1006
2	Justin	Simon	128922	1005
2	Justin	Simon	130000	1005
3	Kelly	Rosario	42689	1002

Tables: ms_employee_salary

Adding Comparisons

The simplest and most common use: "How far is this employee from their department average?" This pattern appears in every performance review analysis, every sales benchmarking report, and every anomaly detection pipeline. Compute the group stat, subtract from the individual value, and you have your comparison.

Tables: ms_employee_salary

When to Use Which

“Total revenue by region?” → .groupby().sum(). “Show each sale with its region’s total?” → .transform(). Ask yourself: collapse rows, or keep them all?

Finding Updated Records

View solution

We have a table with employees and their salaries; however, some of the records are old and contain outdated salary information. Since there is no timestamp, assume salary is non-decreasing over time. You can consider the current salary for an employee is the largest salary value among their records. If multiple records share the same maximum salary, return any one of them. Output their id, first name, last name, department ID, and current salary. Order your list by employee ID in ascending order.

Tables: ms_employee_salary

Key Takeaways

.transform() calculates across groups without collapsing — the pandas equivalent of SQL window functions.
Every original row stays; the aggregated value is broadcast to each row in the group.
Use .groupby().agg() when you want summaries; .transform() when you want detail + context.

What’s Next

Now let’s look at ranking — .rank() with different methods gives you ROW_NUMBER, RANK, and DENSE_RANK equivalents.

Next upContinue →

Ranking

25 min