Filtering Rows
Progress Tracking
Log in to save this lesson and continue from where you left off.
Filtering is where pandas goes from "interesting" to genuinely useful. Every analytics question starts with "show me the rows where..." — orders over $100, users from California, events after March. This is the skill you’ll use in most analyses.
How Boolean Indexing Works
Filtering in pandas works by creating a True/False mask and passing it inside brackets. Every row where the condition is True stays; everything else is dropped.
# The condition creates a Boolean Series
mask = df["department"] == "HR"
# True for HR rows, False for everything else
# Pass the mask inside brackets to filter
df[mask]
# Or combine in one step (most common)
df[df["department"] == "HR"]The expression inside the brackets is a Series of True/False values — one per row. Pandas keeps only the True rows. This is why you see the DataFrame name repeated: df[df[...]].
Filtering Text Values
orders[orders["order_details"] == "Coat"]| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Filter `techcorp_workforce` to show only HR employees.
== compares values. = assigns values. Using = inside a filter is a syntax error.
Inspection For Glassell Coffee Shop
| serial_number | activity_date | facility_name | score | grade | service_code | service_description | employee_id | facility_address | facility_city | facility_id | facility_state | facility_zip | owner_id | owner_name | pe_description | program_element_pe | program_name | program_status | record_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DAQHRSETQ | 2017-06-08 | MARGARITAS CAFE | 93 | A | 1 | ROUTINE INSPECTION | EE0000006 | 5026 S CRENSHAW BLVD | LOS ANGELES | FA0023656 | CA | 90043 | OW0004133 | BAZAN, ASCENCION | RESTAURANT (61-150) SEATS HIGH RISK | 1638 | MARGARITAS CAFE | ACTIVE | PR0011718 |
| DA2GQRJOS | 2017-03-07 | LAS MOLENDERAS | 97 | A | 1 | ROUTINE INSPECTION | EE0000997 | 2635 WHITTIER BLVD | LOS ANGELES | FA0160416 | CA | 90023 | OW0125379 | MARISOL FEREGRINO | RESTAURANT (0-30) SEATS HIGH RISK | 1632 | LAS MOLENDERAS | INACTIVE | PR0148504 |
| DAMQTA46T | 2016-03-22 | SANDRA'S TAMALES | 93 | A | 1 | ROUTINE INSPECTION | EE0001049 | 5390 WHITTIER BLVD | LOS ANGELES | FA0171769 | CA | 90022-4032 | OW0178828 | SANDRA'S TAMALES INC. | RESTAURANT (0-30) SEATS MODERATE RISK | 1631 | SANDRA'S TAMALES | ACTIVE | PR0164225 |
| DAXMBTIRZ | 2018-02-12 | CAFE GRATITUDE | 97 | A | 1 | ROUTINE INSPECTION | EE0000828 | 639 N LARCHMONT BLVD STE #102 | LOS ANGELES | FA0058921 | CA | 90004 | OW0005704 | CAFE GRATITUDE LARCHMONT LLC | RESTAURANT (61-150) SEATS HIGH RISK | 1638 | CAFE GRATITUDE | ACTIVE | PR0019854 |
| DAK8TBMS0 | 2015-09-10 | THE WAFFLE | 90 | A | 1 | ROUTINE INSPECTION | EE0000709 | 6255 W SUNSET BLVD STE #105 | LOS ANGELES | FA0051830 | CA | 90028 | OW0035796 | THE WAFFLE, LLC | RESTAURANT (61-150) SEATS HIGH RISK | 1638 | THE WAFFLE | ACTIVE | PR0010922 |
Find all inspection details made for facilities owned by 'GLASSELL COFFEE SHOP LLC'.
Comparison Operators
Beyond exact text matches, you’ll often filter on numeric thresholds (salary > 100K), date ranges (joined after 2022), and inequality (not in Admin). Python uses the same operators you’d expect: >, <, >=, <=, !=.
Filtering Numbers
Find employees earning more than 100,000. Display their first name, last name, and salary.
Filtering Dates
Date filtering is how you answer time-bound questions: "employees hired after 2022", "orders placed in March", "events from the last 30 days." As long as the column is a proper datetime type, comparison operators work chronologically.
Find employees who joined after January 1, 2022.
Pandas compares dates chronologically: > "2022-01-01" means "after January 1, 2022." String dates in YYYY-MM-DD format sort correctly.
Excluding Values with !=
!= is the simplest way to exclude a single value from your results. You'll use it when one specific value is causing noise in your analysis — a test account, an internal department, a cancelled status.
# != means "not equal to"
df[df["column"] != "value"]Find all employees who are not in the Admin department.
String Matching with .str
Exact == matches work for clean categories like department names. But real-world data is messier — you need to find names that contain a substring, emails that end with a domain, or products that start with a prefix. The .str accessor handles all of this.
For partial matches — “starts with”, “contains”, “ends with” — use the .str accessor:
# Starts with
df[df["name"].str.startswith("Jo")]
# Contains (anywhere in the string)
df[df["name"].str.contains("son")]
# Ends with
df[df["email"].str.endswith("@gmail.com")]
# Case-insensitive
df[df["name"].str.lower().str.contains("john")]Partial Text Matches
techcorp_workforce[techcorp_workforce["last_name"].str.startswith("S")]| id | cust_id | order_date | order_details | total_order_cost |
|---|---|---|---|---|
| 1 | 3 | 2019-03-04 | Coat | 100 |
| 2 | 3 | 2019-03-01 | Shoes | 80 |
| 3 | 3 | 2019-03-07 | Skirt | 30 |
| 4 | 7 | 2019-02-01 | Coat | 25 |
| 5 | 7 | 2019-03-10 | Shoes | 80 |
Filter `orders` to rows where order details start with the letter B.
Applying a Text Pattern
Find all employees whose first name contains the letter 'o' anywhere.
.str.contains("pattern") uses regex by default. To search for literal special characters like . or $, pass regex=False.
Find Drafts With the Word 'Optimism'
| filename | contents |
|---|---|
| draft1.txt | The stock exchange predicts a bull market which would make many investors happy. |
| draft2.txt | The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market. |
| final.txt | The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market. As always predicting the future market is an uncertain game and all investors should follow their instincts and best practices. |
Find drafts which contains the word 'optimism'.
Common Mistakes
These are the bugs we see most often from people learning pandas filtering. Every one of them produces either a cryptic error or silently wrong results. Knowing these patterns saves hours of debugging.
Using = instead of ==
# Wrong: = is assignment
df[df["department"] = "HR"] # SyntaxError
# Right: == is comparison
df[df["department"] == "HR"]Forgetting to repeat the DataFrame name
# Wrong: bare column name
df["salary" > 100000] # TypeError
# Right: df["column"] inside and outside
df[df["salary"] > 100000]Case sensitivity
# "hr" != "HR" in Python
df[df["department"] == "hr"] # returns nothing
# Use .str.lower() for case-insensitive
df[df["department"].str.lower() == "hr"]Key Takeaways
df[df["col"] == value]filters rows using boolean indexing.- Comparison operators:
==,!=,>,<,>=,<=. .str.contains(),.str.startswith(),.str.endswith()for text patterns.
What’s Next
You can filter on one condition. But what about “HR employees earning over 100K”? That needs combining conditions with & and |, which is next.