Chaining Analysis Steps
Progress Tracking
Log in to save this lesson and continue from where you left off.
Each variable is a checkpoint you can inspect. When something’s wrong, print any intermediate result to find the problem.
customer_totals tells you what’s in it. temp1 tells you nothing. Good variable names are documentation.
Adding Context with Merge
The starter finds above-average spenders. Add a merge to show customer names alongside their totals.
Ratios and Percentages
Percentage calculations follow a template: compute the per-group number, the total, then divide.
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
Calculate what percentage of the company each department represents by headcount.
Customers with Large Orders
| customer_id | customer_name |
|---|---|
| 1 | Alice Johnson |
| 2 | Bob Smith |
| 3 | Carol Williams |
| 4 | David Brown |
| 5 | Emma Davis |
| order_id | customer_id | amount | status |
|---|---|---|---|
| 101 | 1 | 150 | paid |
| 102 | 1 | 200 | paid |
| 103 | 1 | 75 | paid |
| 104 | 2 | 250 | paid |
| 105 | 3 | 180 | paid |
The marketing team wants to identify high-value customers for a premium loyalty program. Find all customers who have placed at least one order over $100. Return `customer ID` and `name`.
Employee and Manager Salaries
| id | first_name | last_name | age | sex | employee_title | department | salary | target | bonus | city | address | manager_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | 150 | Max@company.com | California | 2638 Richards Avenue | 1 |
| 13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | 300 | Katty@company.com | Arizona | 1 | |
| 11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | 300 | Richerd@company.com | Alabama | 1 | |
| 10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | 150 | Jennifer@company.com | Alabama | 13 | |
| 19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | 300 | George@company.com | Florida | 1003 Wyatt Street | 1 |
Find employees who are earning more than their managers. Output the employee's first name along with the corresponding salary.
Key Takeaways
- Name each step as a variable — this is pandas’ equivalent of CTEs.
- Each variable is a checkpoint you can inspect for debugging.
- Chain: filter → group → aggregate → merge → filter again.
- For ratios: compute numerator and denominator separately, then divide.
What’s Next
The final lesson covers common patterns you’ll use regularly: safe division, replacing slow loops, and knowing when a one-liner is better than multiple steps.