Combining Techniques
Progress Tracking
Log in to save this lesson and continue from where you left off.
Putting It All Together
Real problems rarely test one concept. You’ll get messy data with dates as strings, names that need cleaning, and a request to “show monthly trends by category.” That requires combining everything.
Conditional Logic with np.where
In SQL, CASE WHEN handles conditional logic. In pandas, you have two tools: np.where() for simple two-outcome conditions (if/else), and np.select() for multiple branches (if/elif/elif/else). np.select is particularly powerful because it handles unlimited conditions cleanly — no nested ternaries.
# Two outcomes
df["label"] = np.where(
df["salary"] > 100000, "High", "Standard"
)
# Multiple outcomes: use np.select
conditions = [
df["salary"] > 100000,
df["salary"] > 70000,
]
choices = ["High", "Mid"]
df["tier"] = np.select(conditions, choices, default="Entry")Bucketing with np.select
Calculate days employed, then categorize: under 365 = New, under 1095 = Mid, else Senior.
Multi-Step Pipeline
Real analysis chains everything together: clean the data, extract date parts, categorize with conditional logic, group, aggregate. Each step is a named variable you can inspect. When something’s wrong, print any intermediate result to find exactly where the problem is. This debuggability is a major advantage of the step-by-step approach over one giant expression.
Clean department names, calculate tenure categories, then count employees by department and tenure.
Risky Projects
| id | title | budget | start_date | end_date |
|---|---|---|---|---|
| 1 | Project1 | 29498 | 2018-08-31 | 2019-03-13 |
| 2 | Project2 | 32487 | 2018-01-27 | 2018-12-13 |
| 3 | Project3 | 43909 | 2019-11-05 | 2019-12-09 |
| 4 | Project4 | 15776 | 2018-06-28 | 2018-11-20 |
| 5 | Project5 | 36268 | 2019-03-13 | 2020-01-02 |
| emp_id | project_id |
|---|---|
| 10592 | 1 |
| 10593 | 2 |
| 10594 | 3 |
| 10595 | 4 |
| 10596 | 5 |
| id | first_name | last_name | salary |
|---|---|---|---|
| 10592 | Jennifer | Roberts | 20204 |
| 10593 | Haley | Ho | 33154 |
| 10594 | Eric | Mccarthy | 32360 |
| 10595 | Gina | Martinez | 46388 |
| 10596 | Jason | Fields | 12348 |
You are given a set of projects and employee data. Each project has a name, a budget, and a specific duration, while each employee has an annual salary and may be assigned to one or more projects for particular periods. The task is to identify which projects are overbudget. A project is considered overbudget if the prorated cost of all employees assigned to it exceeds the project’s budget. To solve this, you must prorate each employee's annual salary based on the exact period they work on a given project, relative to a full year. For example, if an employee works on a six-month project, only half of their annual salary should be attributed to that project. Sum these prorated salary amounts for all employees assigned to a project and compare the total with the project’s budget. Your output should be a list of overbudget projects, where each entry includes the project’s name, its budget, and the total prorated employee expenses for that project. The total expenses should be rounded up to the nearest dollar. Assume all years have 365 days and disregard leap years.
Acceptance Rate By Date
| user_id_sender | user_id_receiver | date | action |
|---|---|---|---|
| ad4943sdz | 948ksx123d | 2020-01-04 | sent |
| ad4943sdz | 948ksx123d | 2020-01-06 | accepted |
| dfdfxf9483 | 9djjjd9283 | 2020-01-04 | sent |
| dfdfxf9483 | 9djjjd9283 | 2020-01-15 | accepted |
| ffdfff4234234 | lpjzjdi4949 | 2020-01-06 | sent |
Calculate the friend acceptance rate for each `date` when friend requests were sent. A request is sent if `action` = `sent` and accepted if `action` = `accepted`. If a request is not accepted, there is no record of it being accepted in the table. The output will only include dates where requests were sent and at least one of them was accepted (acceptance can occur on any date after the request is sent).
Key Takeaways
np.where(condition, true_val, false_val)for two outcomes.np.select(conditions, choices, default)for multiple outcomes — replaces SQL’sCASE WHEN.- Chain: convert types → extract/clean → categorize → group → aggregate.
- Name each step as a variable for debuggability.
Module Complete
You’ve finished Module 5. You can now extract and manipulate dates, clean and transform text, split delimited strings, use regex, and combine everything with conditional logic. Next up: window functions — running totals, rankings, and the patterns that separate junior from senior analysts.