Chaining Analysis Steps
Progress Tracking
Log in to save this lesson and continue from where you left off.
Each variable is a checkpoint you can inspect. When something’s wrong, print any intermediate result to find the problem.
customer_totals tells you what’s in it. temp1 tells you nothing. Good variable names are documentation.
Adding Context with Merge
After computing your aggregation, you often need to merge the result back with another DataFrame to add names, labels, or other context.
| id | cust_id | order_date | order_details | total_order_cost |
|---|---|---|---|---|
| 1 | 3 | 2019-03-04 | Coat | 100 |
| 2 | 3 | 2019-03-01 | Shoes | 80 |
| 3 | 3 | 2019-03-07 | Skirt | 30 |
| 4 | 7 | 2019-02-01 | Coat | 25 |
| 5 | 7 | 2019-03-10 | Shoes | 80 |
| id | first_name | last_name | city | address | phone_number |
|---|---|---|---|---|---|
| 8 | John | Joseph | San Francisco | 928-386-8164 | |
| 7 | Jill | Michael | Austin | 813-297-0692 | |
| 4 | William | Daniel | Denver | 813-368-1200 | |
| 5 | Henry | Jackson | Miami | 808-601-7513 | |
| 13 | Emma | Isaac | Miami | 808-690-5201 |
The starter finds above-average spenders. Add a merge to show customer names alongside their totals.
Ratios and Percentages
Percentage calculations follow a template: compute the per-group number, the total, then divide.
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
Calculate what percentage of the company each department represents by headcount.
Customers with Large Orders
| customer_id | customer_name |
|---|---|
| 1 | Alice Johnson |
| 2 | Bob Smith |
| 3 | Carol Williams |
| 4 | David Brown |
| 5 | Emma Davis |
| order_id | customer_id | amount | status |
|---|---|---|---|
| 101 | 1 | 150 | paid |
| 102 | 1 | 200 | paid |
| 103 | 1 | 75 | paid |
| 104 | 2 | 250 | paid |
| 105 | 3 | 180 | paid |
The marketing team wants to identify high-value customers for a premium loyalty program. Find all customers who have placed at least one order over $100. Return `customer ID` and `name`. Consider all orders regardless of their payment or fulfillment status.
Employee and Manager Salaries
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
Find employees who are earning more than their managers. Output the employee's first name along with the corresponding salary.
Key Takeaways
- Name each step as a variable — this is pandas’ equivalent of CTEs.
- Each variable is a checkpoint you can inspect for debugging.
- Chain: filter → group → aggregate → merge → filter again.
- For ratios: compute numerator and denominator separately, then divide.
What’s Next
The final lesson covers common patterns you’ll use regularly: safe division, replacing slow loops, and knowing when a one-liner is better than multiple steps.