Module 1: DataFrame Fundamentals50 min

Filtering Rows

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Filtering is where pandas goes from "interesting" to genuinely useful. Every analytics question starts with "show me the rows where..." — orders over $100, users from California, events after March. This is the skill you’ll use in most analyses.

How Boolean Indexing Works

Filtering in pandas works by creating a True/False mask and passing it inside brackets. Every row where the condition is True stays; everything else is dropped.

Python
# The condition creates a Boolean Series
mask = df["department"] == "HR"

# True for HR rows, False for everything else

# Pass the mask inside brackets to filter
df[mask]

# Or combine in one step (most common)
df[df["department"] == "HR"]
The Boolean Mask

The expression inside the brackets is a Series of True/False values — one per row. Pandas keeps only the True rows. This is why you see the DataFrame name repeated: df[df[...]].

Filtering Text Values

Python
orders[orders["order_details"] == "Coat"]
Table: techcorp_workforce
idfirst_namelast_namedepartmentsalaryphone_numberjoining_date
1SarahMitchellHR95000555-01012021-03-15
2MichaelChenHR88000555-01022022-06-01
3EmilyRodriguezHR825002021-09-20
4DavidParkHR80000555-01042023-01-10
5LisaThompsonHR650002021-04-05
1
Filter by Department

Filter `techcorp_workforce` to show only HR employees.

Tables: techcorp_workforce
== Not =

== compares values. = assigns values. Using = inside a filter is a syntax error.

Inspection For Glassell Coffee Shop

Table: los_angeles_restaurant_health_inspections
serial_numberactivity_datefacility_namescoregradeservice_codeservice_descriptionemployee_idfacility_addressfacility_cityfacility_idfacility_statefacility_zipowner_idowner_namepe_descriptionprogram_element_peprogram_nameprogram_statusrecord_id
DAQHRSETQ2017-06-08MARGARITAS CAFE93A1ROUTINE INSPECTIONEE00000065026 S CRENSHAW BLVDLOS ANGELESFA0023656CA90043OW0004133BAZAN, ASCENCIONRESTAURANT (61-150) SEATS HIGH RISK1638MARGARITAS CAFEACTIVEPR0011718
DA2GQRJOS2017-03-07LAS MOLENDERAS97A1ROUTINE INSPECTIONEE00009972635 WHITTIER BLVDLOS ANGELESFA0160416CA90023OW0125379MARISOL FEREGRINORESTAURANT (0-30) SEATS HIGH RISK1632LAS MOLENDERASINACTIVEPR0148504
DAMQTA46T2016-03-22SANDRA'S TAMALES93A1ROUTINE INSPECTIONEE00010495390 WHITTIER BLVDLOS ANGELESFA0171769CA90022-4032OW0178828SANDRA'S TAMALES INC.RESTAURANT (0-30) SEATS MODERATE RISK1631SANDRA'S TAMALESACTIVEPR0164225
DAXMBTIRZ2018-02-12CAFE GRATITUDE97A1ROUTINE INSPECTIONEE0000828639 N LARCHMONT BLVD STE #102LOS ANGELESFA0058921CA90004OW0005704CAFE GRATITUDE LARCHMONT LLCRESTAURANT (61-150) SEATS HIGH RISK1638CAFE GRATITUDEACTIVEPR0019854
DAK8TBMS02015-09-10THE WAFFLE90A1ROUTINE INSPECTIONEE00007096255 W SUNSET BLVD STE #105LOS ANGELESFA0051830CA90028OW0035796THE WAFFLE, LLCRESTAURANT (61-150) SEATS HIGH RISK1638THE WAFFLEACTIVEPR0010922
2
Inspection For Glassell Coffee Shop
View solution

Find all inspection details made for facilities owned by 'GLASSELL COFFEE SHOP LLC'.

Tables: los_angeles_restaurant_health_inspections

Comparison Operators

Beyond exact text matches, you’ll often filter on numeric thresholds (salary > 100K), date ranges (joined after 2022), and inequality (not in Admin). Python uses the same operators you’d expect: >, <, >=, <=, !=.

Filtering Numbers

3
Filter Employees by Salary

Find employees earning more than 100,000. Display their first name, last name, and salary.

Tables: techcorp_workforce

Filtering Dates

Date filtering is how you answer time-bound questions: "employees hired after 2022", "orders placed in March", "events from the last 30 days." As long as the column is a proper datetime type, comparison operators work chronologically.

4
Filter Employees by Join Date

Find employees who joined after January 1, 2022.

Tables: techcorp_workforce
Date Comparison Is Chronological

Pandas compares dates chronologically: > "2022-01-01" means "after January 1, 2022." String dates in YYYY-MM-DD format sort correctly.

Excluding Values with !=

!= is the simplest way to exclude a single value from your results. You'll use it when one specific value is causing noise in your analysis — a test account, an internal department, a cancelled status.

Python
# != means "not equal to"
df[df["column"] != "value"]
5
Exclude a Department

Find all employees who are not in the Admin department.

Tables: techcorp_workforce

String Matching with .str

Exact == matches work for clean categories like department names. But real-world data is messier — you need to find names that contain a substring, emails that end with a domain, or products that start with a prefix. The .str accessor handles all of this.

For partial matches — “starts with”, “contains”, “ends with” — use the .str accessor:

Python
# Starts with
df[df["name"].str.startswith("Jo")]

# Contains (anywhere in the string)
df[df["name"].str.contains("son")]

# Ends with
df[df["email"].str.endswith("@gmail.com")]

# Case-insensitive
df[df["name"].str.lower().str.contains("john")]

Partial Text Matches

Python
techcorp_workforce[techcorp_workforce["last_name"].str.startswith("S")]
Table: orders
idcust_idorder_dateorder_detailstotal_order_cost
132019-03-04Coat100
232019-03-01Shoes80
332019-03-07Skirt30
472019-02-01Coat25
572019-03-10Shoes80
6
Filter Products Starting With a Letter

Filter `orders` to rows where order details start with the letter B.

Tables: orders

Applying a Text Pattern

7
Filter Names by Pattern

Find all employees whose first name contains the letter 'o' anywhere.

Tables: techcorp_workforce
.str.contains() Supports Regex

.str.contains("pattern") uses regex by default. To search for literal special characters like . or $, pass regex=False.

Find Drafts With the Word 'Optimism'

Table: google_file_store
filenamecontents
draft1.txtThe stock exchange predicts a bull market which would make many investors happy.
draft2.txtThe stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market.
final.txtThe stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market. As always predicting the future market is an uncertain game and all investors should follow their instincts and best practices.
8
Find drafts which contains the word 'optimism'
View solution

Find drafts which contains the word 'optimism'.

Tables: google_file_store

Common Mistakes

These are the bugs we see most often from people learning pandas filtering. Every one of them produces either a cryptic error or silently wrong results. Knowing these patterns saves hours of debugging.

Using = instead of ==

Python
# Wrong: = is assignment
df[df["department"] = "HR"]  # SyntaxError

# Right: == is comparison
df[df["department"] == "HR"]

Forgetting to repeat the DataFrame name

Python
# Wrong: bare column name
df["salary" > 100000]  # TypeError

# Right: df["column"] inside and outside
df[df["salary"] > 100000]

Case sensitivity

Python
# "hr" != "HR" in Python
df[df["department"] == "hr"]  # returns nothing

# Use .str.lower() for case-insensitive
df[df["department"].str.lower() == "hr"]

Key Takeaways

  • df[df["col"] == value] filters rows using boolean indexing.
  • Comparison operators: ==, !=, >, <, >=, <=.
  • .str.contains(), .str.startswith(), .str.endswith() for text patterns.

What’s Next

You can filter on one condition. But what about “HR employees earning over 100K”? That needs combining conditions with & and |, which is next.