Module 4: Multi-Step Analysis25 min

Chaining Analysis Steps

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Each variable is a checkpoint you can inspect. When something’s wrong, print any intermediate result to find the problem.

Name Your Variables Well

customer_totals tells you what’s in it. temp1 tells you nothing. Good variable names are documentation.

Adding Context with Merge

After computing your aggregation, you often need to merge the result back with another DataFrame to add names, labels, or other context.

Table: orders
idcust_idorder_dateorder_detailstotal_order_cost
132019-03-04Coat100
232019-03-01Shoes80
332019-03-07Skirt30
472019-02-01Coat25
572019-03-10Shoes80
Table: customers
idfirst_namelast_namecityaddressphone_number
8JohnJosephSan Francisco928-386-8164
7JillMichaelAustin813-297-0692
4WilliamDanielDenver813-368-1200
5HenryJacksonMiami808-601-7513
13EmmaIsaacMiami808-690-5201
1
Enrich with Customer Names

The starter finds above-average spenders. Add a merge to show customer names alongside their totals.

Tables: orders, customers

Ratios and Percentages

Percentage calculations follow a template: compute the per-group number, the total, then divide.

Table: employee
idfirst_namelast_nameagesexemployee_titledepartmentsalarytargetbonusemailcityaddressmanager_id
5MaxGeorge26MSalesSales1300200150Max@company.comCalifornia2638 Richards Avenue1
13KattyBond56FManagerManagement1500000300Katty@company.comArizona1
11RicherdGear57MManagerManagement2500000300Richerd@company.comAlabama1
10JenniferDion34FSalesSales1000200150Jennifer@company.comAlabama13
19GeorgeJoe50MManagerManagement1000000300George@company.comFlorida1003 Wyatt Street1
2
Department Headcount Percentages

Calculate what percentage of the company each department represents by headcount.

Tables: employee

Customers with Large Orders

Table: online_store_customers
customer_idcustomer_name
1Alice Johnson
2Bob Smith
3Carol Williams
4David Brown
5Emma Davis
Table: online_store_orders
order_idcustomer_idamountstatus
1011150paid
1021200paid
103175paid
1042250paid
1053180paid
3
Customers with Large Orders
View solution

The marketing team wants to identify high-value customers for a premium loyalty program. Find all customers who have placed at least one order over $100. Return `customer ID` and `name`. Consider all orders regardless of their payment or fulfillment status.

Tables: online_store_customers, online_store_orders

Employee and Manager Salaries

Table: employee
idfirst_namelast_nameagesexemployee_titledepartmentsalarytargetbonusemailcityaddressmanager_id
5MaxGeorge26MSalesSales1300200150Max@company.comCalifornia2638 Richards Avenue1
13KattyBond56FManagerManagement1500000300Katty@company.comArizona1
11RicherdGear57MManagerManagement2500000300Richerd@company.comAlabama1
10JenniferDion34FSalesSales1000200150Jennifer@company.comAlabama13
19GeorgeJoe50MManagerManagement1000000300George@company.comFlorida1003 Wyatt Street1
4
Employee and Manager Salaries
View solution

Find employees who are earning more than their managers. Output the employee's first name along with the corresponding salary.

Tables: employee

Key Takeaways

  • Name each step as a variable — this is pandas’ equivalent of CTEs.
  • Each variable is a checkpoint you can inspect for debugging.
  • Chain: filter → group → aggregate → merge → filter again.
  • For ratios: compute numerator and denominator separately, then divide.

What’s Next

The final lesson covers common patterns you’ll use regularly: safe division, replacing slow loops, and knowing when a one-liner is better than multiple steps.