Module 5: Dates, Strings & Logic•40 min

Pattern Matching and String Splitting

Progress Tracking

Splitting Strings

Here’s a scenario you’ll encounter often: someone stored categories as "Electronics;Gadgets;Sale" in a single column. Or addresses as "City, State ZIP". You need the second element, or all elements as separate rows. .str.split() handles both — it’s the pandas equivalent of SQL’s SPLIT_PART, but more flexible because you can split into columns or explode into rows.

The .str.split() method splits by a delimiter and returns a list per row. Use expand=True to get separate columns:

Python

# Split into a list
df["tags"].str.split(",")

# Split into separate columns
df["tags"].str.split(",", expand=True)

# Get the Nth element
df["tags"].str.split(",").str[1]  # second element

Python

yelp_business["categories"].str.split(";").str[0].head(10)

Exploding Lists into Rows

When a column contains comma-separated values, .str.split() + .explode() turns each value into its own row:

Python

# One row per category
yelp_business["categories"].str.split(";").explode()

One Row Per Value

Table: yelp_business

business_id	name	neighborhood	address	city	state	postal_code	latitude	longitude	stars	review_count	is_open	categories
G5ERFWvPfHy7IDAUYlWL2A	All Colors Mobile Bumper Repair		7137 N 28th Ave	Phoenix	AZ	85051	33.45	-112.07	1	4	1	Auto Detailing;Automotive
0jDvRJS-z9zdMgOUXgr6rA	Sunfare		811 W Deer Valley Rd	Phoenix	AZ	85027	33.68	-112.08	5	27	1	Personal Chefs;Food;Gluten-Free;Food Delivery Services;Event Planning & Services;Restaurants
6HmDqeNNZtHMK0t2glF_gg	Dry Clean Vegas	Southeast	2550 Windmill Ln, Ste 100	Las Vegas	NV	89123	36.04	-115.12	1	4	1	Dry Cleaning & Laundry;Laundry Services;Local Services;Dry Cleaning
pbt3SBcEmxCfZPdnmU9tNA	The Cuyahoga Room		740 Munroe Falls Ave	Cuyahoga Falls	OH	44221	41.14	-81.47	1	3	0	Wedding Planning;Caterers;Event Planning & Services;Venues & Event Spaces
CX8pfLn7Bk9o2-8yDMp_2w	The UPS Store		4815 E Carefree Hwy, Ste 108	Cave Creek	AZ	85331	33.8	-111.98	1.5	5	1	Notaries;Printing Services;Local Services;Shipping Centers

Tables: yelp_business

.str.strip() After Split

Splitting "a, b, c" by "," gives ["a", " b", " c"] with leading spaces. Chain .str.strip() after .explode() to clean them.

Regex with .str Methods

When patterns get complex — "extract all digits", "find anything that looks like an email", "remove all non-alphanumeric characters" — regular expressions are the answer. They’re intimidating at first, but pandas makes them easy: .str.contains(), .str.extract(), and .str.replace() all accept regex patterns natively. You don’t need to import re for most cases.

The .str accessor supports regex natively:

Python

# Check if matches pattern
df["email"].str.contains(r"@gmail\.com$", regex=True)

# Extract pattern groups
df["email"].str.extract(r"@(.+)$")  # domain

# Replace with regex
df["phone"].str.replace(r"[^0-9]", "", regex=True)

Reviews of Categories

Table: yelp_business

business_id	name	neighborhood	address	city	state	postal_code	latitude	longitude	stars	review_count	is_open	categories
G5ERFWvPfHy7IDAUYlWL2A	All Colors Mobile Bumper Repair		7137 N 28th Ave	Phoenix	AZ	85051	33.45	-112.07	1	4	1	Auto Detailing;Automotive
0jDvRJS-z9zdMgOUXgr6rA	Sunfare		811 W Deer Valley Rd	Phoenix	AZ	85027	33.68	-112.08	5	27	1	Personal Chefs;Food;Gluten-Free;Food Delivery Services;Event Planning & Services;Restaurants
6HmDqeNNZtHMK0t2glF_gg	Dry Clean Vegas	Southeast	2550 Windmill Ln, Ste 100	Las Vegas	NV	89123	36.04	-115.12	1	4	1	Dry Cleaning & Laundry;Laundry Services;Local Services;Dry Cleaning
pbt3SBcEmxCfZPdnmU9tNA	The Cuyahoga Room		740 Munroe Falls Ave	Cuyahoga Falls	OH	44221	41.14	-81.47	1	3	0	Wedding Planning;Caterers;Event Planning & Services;Venues & Event Spaces
CX8pfLn7Bk9o2-8yDMp_2w	The UPS Store		4815 E Carefree Hwy, Ste 108	Cave Creek	AZ	85331	33.8	-111.98	1.5	5	1	Notaries;Printing Services;Local Services;Shipping Centers

Tables: yelp_business

Count Occurrences Of Words In Drafts

Table: google_file_store

filename	contents
draft1.txt	The stock exchange predicts a bull market which would make many investors happy.
draft2.txt	The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market.
final.txt	The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market. As always predicting the future market is an uncertain game and all investors should follow their instincts and best practices.

Tables: google_file_store

Counting Instances in Text

Table: google_file_store

filename	contents
draft1.txt	The stock exchange predicts a bull market which would make many investors happy.
draft2.txt	The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market.
final.txt	The stock exchange predicts a bull market which would make many investors happy, but analysts warn of possibility of too much optimism and that in fact we are awaiting a bear market. As always predicting the future market is an uncertain game and all investors should follow their instincts and best practices.

Tables: google_file_store

Key Takeaways

.str.split(delim) splits strings; .str.split(delim).str[n] gets the Nth element.
.str.split().explode() turns comma-separated values into rows.
.str.strip() after splitting to clean whitespace.
.str.contains(), .str.extract(), .str.replace() all support regex.
Use regex=True explicitly for clarity.

What’s Next

You now have the full string toolkit. The final lesson combines date extraction, string cleanup, and conditional logic to solve realistic multi-technique problems.

Next upContinue →

Combining Techniques

25 min