The Transform Pattern
Progress Tracking
Log in to save this lesson and continue from where you left off.
The Problem: Per-Group Comparisons
You want employees earning above their department’s average. In SQL, you’d use a CTE to compute department averages and join back. In pandas, .transform() does this in one line.
How .transform() Works
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
# Normal groupby: one row per department
employee.groupby("department")["salary"].mean()
# Transform: one value per ORIGINAL row
employee.groupby("department")["salary"].transform("mean").groupby().mean() collapses to one row per group. .groupby().transform("mean") keeps the original row count, repeating the group’s mean for each member.
The .transform() Pattern
orders["cust_avg"] = (
orders.groupby("cust_id")["total_order_cost"]
.transform("mean")
)
orders[["cust_id", "total_order_cost", "cust_avg"]]Add a department average salary column to each employee row using .transform().
Filtering with Transform
The starter adds the department average column. Filter to employees earning above it.
.transform() is a shortcut for: groupby → aggregate → merge back. Use .transform() for single-column operations. Use the merge approach when you need multiple aggregated columns.
The Merge-Back Alternative
.transform() is perfect for single-column operations like mean, max, or count. But when you need multiple aggregated columns (mean AND count AND max), the merge-back approach is cleaner: aggregate into a summary DataFrame, then merge it back to the original. Think of .transform() as a shortcut for the simple case.
dept_stats = (
employee
.groupby("department")["salary"]
.agg(["mean", "max"])
.reset_index()
)
pd.merge(employee, dept_stats, on="department")Max Per Group
Find employees who earn the maximum salary in their department.
Average Salaries
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
Compare each employee's salary with the average salary of the corresponding department. Output the department, first name, and salary of employees along with the average salary of that department.
Highest Salary In Department
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
Find the employee with the highest salary per department. Output the department name, employee's first name along with the corresponding salary.
Key Takeaways
.transform("agg")broadcasts group results back to every row.- Use it for per-group comparisons: above average, equals max, etc.
- For multiple aggregated columns, merge back instead.
- This replaces SQL’s correlated subqueries and CTE+JOIN patterns.
What’s Next
You can now do per-group comparisons. Next: chaining multiple analysis steps together — building complex analyses from named intermediate results.