Introduction to Window Operations
Progress Tracking
Log in to save this lesson and continue from where you left off.
The Frustrating Limitation
Your manager asks for a report showing each employee’s salary alongside their department’s average. You know how to get individual salaries and department averages separately — but combining them?
Window Functions Keep Every Row
You saw .transform() in Module 4 for per-group calculations. Here we’ll build on it as the foundation for all window operations — ranking, shifting, and running totals.
orders["cust_avg"] = (
orders.groupby("cust_id")["total_order_cost"]
.transform("mean")
)
orders[["cust_id", "total_order_cost", "cust_avg"]]| id | first_name | last_name | salary | department_id |
|---|---|---|---|---|
| 1 | Todd | Wilson | 110000 | 1006 |
| 1 | Todd | Wilson | 106119 | 1006 |
| 2 | Justin | Simon | 128922 | 1005 |
| 2 | Justin | Simon | 130000 | 1005 |
| 3 | Kelly | Rosario | 42689 | 1002 |
Add a department average salary column using .transform(). Show first name, department, salary, and the average.
Adding Comparisons
The simplest and most common use: "How far is this employee from their department average?" This pattern appears in every performance review analysis, every sales benchmarking report, and every anomaly detection pipeline. Compute the group stat, subtract from the individual value, and you have your comparison.
The starter adds the department average. Add a column showing how far each salary is from the department average.
“Total revenue by region?” → .groupby().sum(). “Show each sale with its region’s total?” → .transform(). Ask yourself: collapse rows, or keep them all?
Finding Updated Records
| id | first_name | last_name | salary | department_id |
|---|---|---|---|---|
| 1 | Todd | Wilson | 110000 | 1006 |
| 1 | Todd | Wilson | 106119 | 1006 |
| 2 | Justin | Simon | 128922 | 1005 |
| 2 | Justin | Simon | 130000 | 1005 |
| 3 | Kelly | Rosario | 42689 | 1002 |
We have a table with employees and their salaries; however, some of the records are old and contain outdated salary information. Since there is no timestamp, assume salary is non-decreasing over time. You can consider the current salary for an employee is the largest salary value among their records. If multiple records share the same maximum salary, return any one of them. Output their id, first name, last name, department ID, and current salary. Order your list by employee ID in ascending order.
Key Takeaways
.transform()calculates across groups without collapsing — the pandas equivalent of SQL window functions.- Every original row stays; the aggregated value is broadcast to each row in the group.
- Use
.groupby().agg()when you want summaries;.transform()when you want detail + context.
What’s Next
Now let’s look at ranking — .rank() with different methods gives you ROW_NUMBER, RANK, and DENSE_RANK equivalents.