Module 4: Multi-Step Analysis30 min

Step-by-Step Analysis

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

When One Line Isn’t Enough

Most interesting analytics questions can’t be answered in one step. "Find everyone earning above average" requires knowing the average first. "Which customers have never ordered?" requires knowing who has ordered. In SQL you’d nest subqueries or chain CTEs. In pandas, you simply use intermediate variables — compute a result, store it, and use it in the next step. It’s more natural, more readable, and easier to debug.

Python
avg_salary = employee["salary"].mean()
employee[employee["salary"] > avg_salary]

Finding the Maximum

Python
max_salary = employee["salary"].max()
employee[employee["salary"] == max_salary]

Using a Scalar Result

1
Find the Highest Earner

The starter finds the minimum earner. Change it to find the maximum.

Tables: employee

Filtering with Lists of Values

A very common pattern: compute a list of values from one query, then use .isin() to filter another DataFrame. This is the pandas equivalent of SQL’s IN (subquery). The key insight is that .unique() gives you an array you can pass directly to .isin().

Python
# Departments that have high earners
high_depts = employee[employee["salary"] > 100000]["department"].unique()

# All employees in those departments
employee[employee["department"].isin(high_depts)]
No NOT IN Trap in Pandas

In SQL, NOT IN fails silently when the list contains NULL. In pandas, ~df["col"].isin(values) works correctly regardless of NaN.

Filtering with a Computed List

2
Find Employees in High-Earning Departments

Find all employees who work in a department that has at least one person earning over 100,000.

Tables: employee

Finding What’s Missing

The pattern for "find what’s missing": get the IDs that DO appear, then filter for the ones that DON’T with ~df["col"].isin(values).

Table: customers
idfirst_namelast_namecityaddressphone_number
8JohnJosephSan Francisco928-386-8164
7JillMichaelAustin813-297-0692
4WilliamDanielDenver813-368-1200
5HenryJacksonMiami808-601-7513
13EmmaIsaacMiami808-690-5201
Table: orders
idcust_idorder_dateorder_detailstotal_order_cost
132019-03-04Coat100
232019-03-01Shoes80
332019-03-07Skirt30
472019-02-01Coat25
572019-03-10Shoes80
3
Find Customers with No Orders

Find customers who have never placed an order.

Tables: customers, orders

Workers With The Highest Salaries

Table: worker
worker_idfirst_namelast_namesalaryjoining_datedepartment
1MonikaArora1000002014-02-20HR
2NiharikaVerma800002014-06-11Admin
3VishalSinghal3000002014-02-20HR
4AmitahSingh5000002014-02-20Admin
5VivekBhati5000002014-06-11Admin
Table: title
worker_ref_idworker_titleaffected_from
1Manager2016-02-20
2Executive2016-06-11
8Executive2016-06-11
5Manager2016-06-11
4Asst. Manager2016-06-11
4
Workers With The Highest Salaries
View solution

Management wants to analyze only employees with official job titles. Find the job titles of the employees with the highest salary. If multiple employees have the same highest salary, include all their job titles.

Tables: worker, title

Most Profitable Financial Company

Table: forbes_global_2010_2014
companysectorindustrycontinentcountrymarketvaluesalesprofitsassetsrank
ICBCFinancialsMajor BanksAsiaChina215.6148.742.73124.91
China Construction BankFinancialsRegional BanksAsiaChina174.4121.334.22449.54
Agricultural Bank of ChinaFinancialsRegional BanksAsiaChina141.1136.4272405.48
JPMorgan ChaseFinancialsMajor BanksNorth AmericaUnited States229.7105.717.32435.320
Berkshire HathawayFinancialsInvestment ServicesNorth AmericaUnited States309.1178.819.5493.417
5
Most Profitable Financial Company
View solution

Find the most profitable company from the financial sector. Output the result along with the continent.

Tables: forbes_global_2010_2014

Key Takeaways

  • Break complex questions into intermediate variables: compute, then use.
  • .max() / .min() + filter replaces scalar subqueries.
  • .unique() + .isin() replaces IN (subquery).
  • ~df["col"].isin(values) replaces NOT IN — and it’s NaN-safe.

What’s Next

What about questions like “Find employees who earn above their department’s average”? You need a per-group calculation applied back to each row. That’s .transform(), and it’s next.