Module 5: Dates, Strings & Logic•25 min

Combining Techniques

Progress Tracking

Putting It All Together

Real problems rarely test one concept. You’ll get messy data with dates as strings, names that need cleaning, and a request to “show monthly trends by category.” That requires combining everything.

Conditional Logic with np.where

In SQL, CASE WHEN handles conditional logic. In pandas, you have two tools: np.where() for simple two-outcome conditions (if/else), and np.select() for multiple branches (if/elif/elif/else). np.select is particularly powerful because it handles unlimited conditions cleanly — no nested ternaries.

Python

# Two outcomes
df["label"] = np.where(
    df["salary"] > 100000, "High", "Standard"
)

# Multiple outcomes: use np.select
conditions = [
    df["salary"] > 100000,
    df["salary"] > 70000,
]
choices = ["High", "Mid"]
df["tier"] = np.select(conditions, choices, default="Entry")

Bucketing with np.select

Table: worker

worker_id	first_name	last_name	salary	joining_date	department
1	Monika	Arora	100000	2014-02-20	HR
2	Niharika	Verma	80000	2014-06-11	Admin
3	Vishal	Singhal	300000	2014-02-20	HR
4	Amitah	Singh	500000	2014-02-20	Admin
5	Vivek	Bhati	500000	2014-06-11	Admin

Tables: worker

Multi-Step Pipeline

Real analysis chains everything together: clean the data, extract date parts, categorize with conditional logic, group, aggregate. Each step is a named variable you can inspect. When something’s wrong, print any intermediate result to find exactly where the problem is. This debuggability is a major advantage of the step-by-step approach over one giant expression.

Tables: worker

Risky Projects

Table: linkedin_projects

id	title	budget	start_date	end_date
1	Project1	29498	2018-08-31	2019-03-13
2	Project2	32487	2018-01-27	2018-12-13
3	Project3	43909	2019-11-05	2019-12-09
4	Project4	15776	2018-06-28	2018-11-20
5	Project5	36268	2019-03-13	2020-01-02

Table: linkedin_emp_projects

emp_id	project_id
10592	1
10593	2
10594	3
10595	4
10596	5

Table: linkedin_employees

id	first_name	last_name	salary
10592	Jennifer	Roberts	20204
10593	Haley	Ho	33154
10594	Eric	Mccarthy	32360
10595	Gina	Martinez	46388
10596	Jason	Fields	12348

Risky Projects

View solution

You are given a set of projects and employee data. Each project has a name, a budget, and a specific duration, while each employee has an annual salary and may be assigned to one or more projects for particular periods. The task is to identify which projects are overbudget. A project is considered overbudget if the prorated cost of all employees assigned to it exceeds the project’s budget. To solve this, you must prorate each employee's annual salary based on the exact period they work on a given project, relative to a full year. For example, if an employee works on a six-month project, only half of their annual salary should be attributed to that project. Sum these prorated salary amounts for all employees assigned to a project and compare the total with the project’s budget. Your output should be a list of overbudget projects, where each entry includes the project’s name, its budget, and the total prorated employee expenses for that project. The total expenses should be rounded up to the nearest dollar. Assume all years have 365 days and disregard leap years.

Tables: linkedin_projects, linkedin_emp_projects, linkedin_employees

Acceptance Rate By Date

Table: fb_friend_requests

user_id_sender	user_id_receiver	date	action
ad4943sdz	948ksx123d	2020-01-04	sent
ad4943sdz	948ksx123d	2020-01-06	accepted
dfdfxf9483	9djjjd9283	2020-01-04	sent
dfdfxf9483	9djjjd9283	2020-01-15	accepted
ffdfff4234234	lpjzjdi4949	2020-01-06	sent

Tables: fb_friend_requests

Key Takeaways

np.where(condition, true_val, false_val) for two outcomes.
np.select(conditions, choices, default) for multiple outcomes — replaces SQL’s CASE WHEN.
Chain: convert types → extract/clean → categorize → group → aggregate.
Name each step as a variable for debuggability.

Module Complete

You’ve finished Module 5. You can now extract and manipulate dates, clean and transform text, split delimited strings, use regex, and combine everything with conditional logic. Next up: window functions — running totals, rankings, and the patterns that separate junior from senior analysts.

Next upContinue →

Introduction to Window Operations

Module 6: Window Operations

10 min