Module 1: DataFrame Fundamentals•50 min

Filtering Rows

Progress Tracking

Filtering is where pandas goes from "interesting" to genuinely useful. Every analytics question starts with "show me the rows where..." — orders over $100, users from California, events after March. This is the skill you’ll use in most analyses.

How Boolean Indexing Works

Filtering in pandas works by creating a True/False mask and passing it inside brackets. Every row where the condition is True stays; everything else is dropped.

Python

# The condition creates a Boolean Series
mask = df["department"] == "HR"

# True for HR rows, False for everything else

# Pass the mask inside brackets to filter
df[mask]

# Or combine in one step (most common)
df[df["department"] == "HR"]

The Boolean Mask

The expression inside the brackets is a Series of True/False values — one per row. Pandas keeps only the True rows. This is why you see the DataFrame name repeated: df[df[...]].

Filtering Text Values

Python

orders[orders["order_details"] == "Coat"]

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Tables: techcorp_workforce

== Not =

== compares values. = assigns values. Using = inside a filter is a syntax error.

Inspection For Glassell Coffee Shop

Table: los_angeles_restaurant_health_inspections

serial_number	activity_date	facility_name	score	grade	service_code	service_description	employee_id	facility_address	facility_city	facility_id	facility_state	facility_zip	owner_id	owner_name	pe_description	program_element_pe	program_name	program_status	record_id
DAQHRSETQ	2017-06-08	MARGARITAS CAFE	93	A	1	ROUTINE INSPECTION	EE0000006	5026 S CRENSHAW BLVD	LOS ANGELES	FA0023656	CA	90043	OW0004133	BAZAN, ASCENCION	RESTAURANT (61-150) SEATS HIGH RISK	1638	MARGARITAS CAFE	ACTIVE	PR0011718
DA2GQRJOS	2017-03-07	LAS MOLENDERAS	97	A	1	ROUTINE INSPECTION	EE0000997	2635 WHITTIER BLVD	LOS ANGELES	FA0160416	CA	90023	OW0125379	MARISOL FEREGRINO	RESTAURANT (0-30) SEATS HIGH RISK	1632	LAS MOLENDERAS	INACTIVE	PR0148504
DAMQTA46T	2016-03-22	SANDRA'S TAMALES	93	A	1	ROUTINE INSPECTION	EE0001049	5390 WHITTIER BLVD	LOS ANGELES	FA0171769	CA	90022-4032	OW0178828	SANDRA'S TAMALES INC.	RESTAURANT (0-30) SEATS MODERATE RISK	1631	SANDRA'S TAMALES	ACTIVE	PR0164225
DAXMBTIRZ	2018-02-12	CAFE GRATITUDE	97	A	1	ROUTINE INSPECTION	EE0000828	639 N LARCHMONT BLVD STE #102	LOS ANGELES	FA0058921	CA	90004	OW0005704	CAFE GRATITUDE LARCHMONT LLC	RESTAURANT (61-150) SEATS HIGH RISK	1638	CAFE GRATITUDE	ACTIVE	PR0019854
DAK8TBMS0	2015-09-10	THE WAFFLE	90	A	1	ROUTINE INSPECTION	EE0000709	6255 W SUNSET BLVD STE #105	LOS ANGELES	FA0051830	CA	90028	OW0035796	THE WAFFLE, LLC	RESTAURANT (61-150) SEATS HIGH RISK	1638	THE WAFFLE	ACTIVE	PR0010922

Tables: los_angeles_restaurant_health_inspections

Comparison Operators

Beyond exact text matches, you’ll often filter on numeric thresholds (salary > 100K), date ranges (joined after 2022), and inequality (not in Admin). Python uses the same operators you’d expect: >, <, >=, <=, !=.

Filtering Numbers

Tables: techcorp_workforce

Filtering Dates

Date filtering is how you answer time-bound questions: "employees hired after 2022", "orders placed in March", "events from the last 30 days." As long as the column is a proper datetime type, comparison operators work chronologically.

Tables: techcorp_workforce

Date Comparison Is Chronological

Pandas compares dates chronologically: > "2022-01-01" means "after January 1, 2022." String dates in YYYY-MM-DD format sort correctly.

Excluding Values with !=

!= is the simplest way to exclude a single value from your results. You'll use it when one specific value is causing noise in your analysis — a test account, an internal department, a cancelled status.

Python

# != means "not equal to"
df[df["column"] != "value"]

Tables: techcorp_workforce

String Matching with .str

Exact == matches work for clean categories like department names. But real-world data is messier — you need to find names that contain a substring, emails that end with a domain, or products that start with a prefix. The .str accessor handles all of this.

For partial matches — “starts with”, “contains”, “ends with” — use the .str accessor:

Python

# Starts with
df[df["name"].str.startswith("Jo")]

# Contains (anywhere in the string)
df[df["name"].str.contains("son")]

# Ends with
df[df["email"].str.endswith("@gmail.com")]

# Case-insensitive
df[df["name"].str.lower().str.contains("john")]

Partial Text Matches

Python

techcorp_workforce[techcorp_workforce["last_name"].str.startswith("S")]

Table: orders

id	cust_id	order_date	order_details	total_order_cost
1	3	2019-03-04	Coat	100
2	3	2019-03-01	Shoes	80
3	3	2019-03-07	Skirt	30
4	7	2019-02-01	Coat	25
5	7	2019-03-10	Shoes	80

Tables: orders

Applying a Text Pattern

Tables: techcorp_workforce

.str.contains() Supports Regex

.str.contains("pattern") uses regex by default. To search for literal special characters like . or $, pass regex=False.

Find Drafts With the Word 'Optimism'

Table: google_file_store

filename	contents
draft1.txt	The stock exchange predicts a bull market which would make many investors happy.
draft2.txt	The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market.
final.txt	The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market. As always predicting the future market is an uncertain game and all investors should follow their instincts and best practices.

Tables: google_file_store

Common Mistakes

These are the bugs we see most often from people learning pandas filtering. Every one of them produces either a cryptic error or silently wrong results. Knowing these patterns saves hours of debugging.

Using = instead of ==

Python

# Wrong: = is assignment
df[df["department"] = "HR"]  # SyntaxError

# Right: == is comparison
df[df["department"] == "HR"]

Forgetting to repeat the DataFrame name

Python

# Wrong: bare column name
df["salary" > 100000]  # TypeError

# Right: df["column"] inside and outside
df[df["salary"] > 100000]

Case sensitivity

Python

# "hr" != "HR" in Python
df[df["department"] == "hr"]  # returns nothing

# Use .str.lower() for case-insensitive
df[df["department"].str.lower() == "hr"]

Key Takeaways

df[df["col"] == value] filters rows using boolean indexing.
Comparison operators: ==, !=, >, <, >=, <=.
.str.contains(), .str.startswith(), .str.endswith() for text patterns.

What’s Next

You can filter on one condition. But what about “HR employees earning over 100K”? That needs combining conditions with & and |, which is next.

Next upContinue →

Combining Conditions

50 min