Module 5: Dates, Strings & Logic25 min

Combining Techniques

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Putting It All Together

Real problems rarely test one concept. You’ll get messy data with dates as strings, names that need cleaning, and a request to “show monthly trends by category.” That requires combining everything.

Conditional Logic with np.where

In SQL, CASE WHEN handles conditional logic. In pandas, you have two tools: np.where() for simple two-outcome conditions (if/else), and np.select() for multiple branches (if/elif/elif/else). np.select is particularly powerful because it handles unlimited conditions cleanly — no nested ternaries.

Python
# Two outcomes
df["label"] = np.where(
    df["salary"] > 100000, "High", "Standard"
)

# Multiple outcomes: use np.select
conditions = [
    df["salary"] > 100000,
    df["salary"] > 70000,
]
choices = ["High", "Mid"]
df["tier"] = np.select(conditions, choices, default="Entry")

Bucketing with np.select

1
Create Tenure Buckets

Calculate days employed, then categorize: under 365 = New, under 1095 = Mid, else Senior.

Tables: worker

Multi-Step Pipeline

Real analysis chains everything together: clean the data, extract date parts, categorize with conditional logic, group, aggregate. Each step is a named variable you can inspect. When something’s wrong, print any intermediate result to find exactly where the problem is. This debuggability is a major advantage of the step-by-step approach over one giant expression.

2
Department Tenure Report

Clean department names, calculate tenure categories, then count employees by department and tenure.

Tables: worker

Risky Projects

Table: linkedin_projects
idtitlebudgetstart_dateend_date
1Project1294982018-08-312019-03-13
2Project2324872018-01-272018-12-13
3Project3439092019-11-052019-12-09
4Project4157762018-06-282018-11-20
5Project5362682019-03-132020-01-02
Table: linkedin_emp_projects
emp_idproject_id
105921
105932
105943
105954
105965
Table: linkedin_employees
idfirst_namelast_namesalary
10592JenniferRoberts20204
10593HaleyHo33154
10594EricMccarthy32360
10595GinaMartinez46388
10596JasonFields12348
3
Risky Projects
View solution

You are given a set of projects and employee data. Each project has a name, a budget, and a specific duration, while each employee has an annual salary and may be assigned to one or more projects for particular periods. The task is to identify which projects are overbudget. A project is considered overbudget if the prorated cost of all employees assigned to it exceeds the project’s budget. To solve this, you must prorate each employee's annual salary based on the exact period they work on a given project, relative to a full year. For example, if an employee works on a six-month project, only half of their annual salary should be attributed to that project. Sum these prorated salary amounts for all employees assigned to a project and compare the total with the project’s budget. Your output should be a list of overbudget projects, where each entry includes the project’s name, its budget, and the total prorated employee expenses for that project. The total expenses should be rounded up to the nearest dollar. Assume all years have 365 days and disregard leap years.

Tables: linkedin_projects, linkedin_emp_projects, linkedin_employees

Acceptance Rate By Date

Table: fb_friend_requests
user_id_senderuser_id_receiverdateaction
ad4943sdz948ksx123d2020-01-04sent
ad4943sdz948ksx123d2020-01-06accepted
dfdfxf94839djjjd92832020-01-04sent
dfdfxf94839djjjd92832020-01-15accepted
ffdfff4234234lpjzjdi49492020-01-06sent
4
Acceptance Rate By Date
View solution

Calculate the friend acceptance rate for each `date` when friend requests were sent. A request is sent if `action` = `sent` and accepted if `action` = `accepted`. If a request is not accepted, there is no record of it being accepted in the table. The output will only include dates where requests were sent and at least one of them was accepted (acceptance can occur on any date after the request is sent).

Tables: fb_friend_requests

Key Takeaways

  • np.where(condition, true_val, false_val) for two outcomes.
  • np.select(conditions, choices, default) for multiple outcomes — replaces SQL’s CASE WHEN.
  • Chain: convert types → extract/clean → categorize → group → aggregate.
  • Name each step as a variable for debuggability.

Module Complete

You’ve finished Module 5. You can now extract and manipulate dates, clean and transform text, split delimited strings, use regex, and combine everything with conditional logic. Next up: window functions — running totals, rankings, and the patterns that separate junior from senior analysts.