Module 4: Multi-Step Analysis35 min

Patterns for Common Problems

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Safe Division

Division by zero is a common source of production bugs. Your analysis works perfectly on test data, then crashes at 2 AM when one department has zero employees. Always protect division — it’s a habit that separates hobby code from production code.

Python
# Dangerous: crashes if denominator is 0
ratio = numerator / denominator

# Safe: returns NaN instead of crashing
denominator = denominator.replace(0, float("nan"))
ratio = numerator / denominator
Always Protect Division

For Series division, use .replace(0, float("nan")) on the denominator. For scalar division, check if total > 0 before dividing.

Pre-Aggregate to Avoid Slow Loops

If you find yourself writing a loop over groups, stop. .transform() or groupby-then-merge is almost always faster and cleaner.

Python
# Slow: Python loop over departments
for dept in employee['department'].unique():
    dept_avg = employee[employee['department'] == dept]['salary'].mean()
        ...

# Fast: one-line transform
employee["dept_avg"] = (
    employee.groupby("department")["salary"].transform("mean")
)

Transform Then Filter

1
Pre-Aggregate and Filter

The starter computes department averages. Add a filter for above-average earners and select the relevant columns.

Tables: employee

When a One-Liner is Better

Not everything needs multiple steps. Simple lookups are cleaner as one-liners:

Python

# One-liner: perfectly readable
employee[employee["salary"] == employee["salary"].max()]

# Also works, but more code than needed: same result, more code
max_sal = employee["salary"].max()
employee[employee["salary"] == max_sal]

Use intermediate variables when: the logic has 3+ steps, you need to reuse a result, or the one-liner is hard to read.

Ratio Calculation

2
Department Salary Percentages

Calculate each department’s total salary as a percentage of the company total. Show only departments over 20%.

Tables: employee

Highest Target Under Manager

Table: salesforce_employees
idfirst_namelast_nameagesexemployee_titledepartmentsalarytargetbonusemailcityaddressmanager_id
1AllenWang55FManagerManagement200000600300Allen@company.comCalifornia1069 Ventura Drive1
20SarrahBicky31FSenior SalesSales2000500150Sarrah@company.comFlorida1176 Tyler Avenue19
7NickyBat33FSalesSales1400400100Molly@company.comArizona3461 Preston Street13
31SteveSmith39MSalesSales1500400125steve@company.comTexas13
32DavidWarner38MSalesSales1200400150david@company.comFlorida13
3
Highest Target Under Manager
View solution

Identify the employee(s) working under manager `manager_id=13` who have achieved the highest target. Return each such employee’s first name alongside the target value. The goal is to display the maximum target among all employees under `manager_id=13` and show which employee(s) reached that top value.

Tables: salesforce_employees

Income By Title and Gender

Table: sf_employee
idfirst_namelast_nameagesexemployee_titledepartmentsalarytargetemailcityaddressmanager_id
5MaxGeorge26MSalesSales1300200Max@company.comCalifornia2638 Richards Avenue1
13KattyBond56FManagerManagement1500000Katty@company.comArizona1
11RicherdGear57MManagerManagement2500000Richerd@company.comAlabama1
10JenniferDion34FSalesSales1000200Jennifer@company.comAlabama13
19GeorgeJoe50MManagerManagement1000000George@company.comFlorida1003 Wyatt Street1
Table: sf_bonus
worker_ref_idbonus
15000
23000
34000
14500
23500
4
Income By Title and Gender
View solution

Find the average total compensation based on employee titles and gender. Total compensation is calculated by adding both the salary and bonus of each employee. However, not every employee receives a bonus so disregard employees without bonuses in your calculation. Employee can receive more than one bonus. Output the employee title, gender (i.e., sex), along with the average total compensation.

Tables: sf_employee, sf_bonus

Top Cool Votes

Table: yelp_reviews
business_namereview_iduser_idstarsreview_datereview_textfunnyusefulcool
AutohausAZC4TSIEcazRay0qIRPeMAFgjlbPUcCRiXlMtarzi9sW5w52011-06-27Autohaus is my main source for parts for an old Mercedes that has been in the family since new that I am restoring. The old beater is truly a labor of121
Citizen Public HouseZZ0paqUsSX-VJbfodTp1cQEeCWSGwMAPzwe_c1Aumd1w42013-03-18First time in PHX. Friend recommended. Friendly waitstaff. They were understaffed and extremely busy, but we never knew. The short ribs were tende000
Otto Pizza & PastrypF6W5JOPBK6kOXTB58cYrwJG1Gd2mN2Qk7UpCqAUI-BQ52013-03-14LOVE THIS PIZZA! This is now one of my favorite pizza places in phoenix area. My husband i walked into this cute family owned business and were greet000
Giant HamburgersQBddRcflAcXwE2qhsLVv7wT90ybanuLhAr0_s99GDeeg32009-03-27ok, so I tried this place out based on other reviews. I had the cheeseburger with everything, some fries, and a chocolate shake. The burger was okay, 011
Tammie Coe CakesY8UMm_Ng9oEpJbIygoGbZQMWt24-6bfv_OHLKhwMQ0Tw32008-08-25Overrated. The treats are tasty but certainly not the best I have ever had. I would have rated this a two star but the cakes and cookies are REALLY pr132
5
Top Cool Votes
View solution

Find the `review_text` that received the highest number of `cool` votes. Output the business name along with the review text with the highest number of `cool` votes.

Tables: yelp_reviews

Key Takeaways

  • Protect division: .replace(0, float("nan")) or if total > 0.
  • Replace loops with .transform() or groupby-then-merge.
  • Use intermediate variables for 3+ steps; one-liners for simple lookups.
  • Every variable is a checkpoint for debugging.

What’s Next

You can now break complex analyses into clean, debuggable steps. Next: custom logic with .apply() — for when built-in methods can’t do what you need.