Module 4: Multi-Step Analysis25 min

The Transform Pattern

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

The Problem: Per-Group Comparisons

You want employees earning above their department’s average. In SQL, you’d use a CTE to compute department averages and join back. In pandas, .transform() does this in one line.

How .transform() Works

Table: employee
idfirst_namelast_nameagesexemployee_titledepartmentsalarytargetbonusemailcityaddressmanager_id
5MaxGeorge26MSalesSales1300200150Max@company.comCalifornia2638 Richards Avenue1
13KattyBond56FManagerManagement1500000300Katty@company.comArizona1
11RicherdGear57MManagerManagement2500000300Richerd@company.comAlabama1
10JenniferDion34FSalesSales1000200150Jennifer@company.comAlabama13
19GeorgeJoe50MManagerManagement1000000300George@company.comFlorida1003 Wyatt Street1
Python
# Normal groupby: one row per department
employee.groupby("department")["salary"].mean()

# Transform: one value per ORIGINAL row
employee.groupby("department")["salary"].transform("mean")

.groupby().mean() collapses to one row per group. .groupby().transform("mean") keeps the original row count, repeating the group’s mean for each member.

The .transform() Pattern

Python
orders["cust_avg"] = (
    orders.groupby("cust_id")["total_order_cost"]
    .transform("mean")
)
orders[["cust_id", "total_order_cost", "cust_avg"]]
1
Add a Department Average Column

Add a department average salary column to each employee row using .transform().

Tables: employee

Filtering with Transform

2
Find Above-Average Earners by Department

The starter adds the department average column. Filter to employees earning above it.

Tables: employee
`.transform()` vs Merge-Back

.transform() is a shortcut for: groupby → aggregate → merge back. Use .transform() for single-column operations. Use the merge approach when you need multiple aggregated columns.

The Merge-Back Alternative

.transform() is perfect for single-column operations like mean, max, or count. But when you need multiple aggregated columns (mean AND count AND max), the merge-back approach is cleaner: aggregate into a summary DataFrame, then merge it back to the original. Think of .transform() as a shortcut for the simple case.

Python
dept_stats = (
    employee
    .groupby("department")["salary"]
    .agg(["mean", "max"])
    .reset_index()
)
pd.merge(employee, dept_stats, on="department")

Max Per Group

3
Find Highest Earner Per Department

Find employees who earn the maximum salary in their department.

Tables: employee

Average Salaries

Table: employee
idfirst_namelast_nameagesexemployee_titledepartmentsalarytargetbonusemailcityaddressmanager_id
5MaxGeorge26MSalesSales1300200150Max@company.comCalifornia2638 Richards Avenue1
13KattyBond56FManagerManagement1500000300Katty@company.comArizona1
11RicherdGear57MManagerManagement2500000300Richerd@company.comAlabama1
10JenniferDion34FSalesSales1000200150Jennifer@company.comAlabama13
19GeorgeJoe50MManagerManagement1000000300George@company.comFlorida1003 Wyatt Street1
4
Average Salaries
View solution

Compare each employee's salary with the average salary of the corresponding department. Output the department, first name, and salary of employees along with the average salary of that department.

Tables: employee

Highest Salary In Department

Table: employee
idfirst_namelast_nameagesexemployee_titledepartmentsalarytargetbonusemailcityaddressmanager_id
5MaxGeorge26MSalesSales1300200150Max@company.comCalifornia2638 Richards Avenue1
13KattyBond56FManagerManagement1500000300Katty@company.comArizona1
11RicherdGear57MManagerManagement2500000300Richerd@company.comAlabama1
10JenniferDion34FSalesSales1000200150Jennifer@company.comAlabama13
19GeorgeJoe50MManagerManagement1000000300George@company.comFlorida1003 Wyatt Street1
5
Highest Salary In Department
View solution

Find the employee with the highest salary per department. Output the department name, employee's first name along with the corresponding salary.

Tables: employee

Key Takeaways

  • .transform("agg") broadcasts group results back to every row.
  • Use it for per-group comparisons: above average, equals max, etc.
  • For multiple aggregated columns, merge back instead.
  • This replaces SQL’s correlated subqueries and CTE+JOIN patterns.

What’s Next

You can now do per-group comparisons. Next: chaining multiple analysis steps together — building complex analyses from named intermediate results.