Module 4: Multi-Step Analysis•30 min

Custom Logic with .apply()

Progress Tracking

Built-in methods like .str handle common cases. But real data has edge cases that don’t fit built-in methods: custom tax brackets, business-specific categorization rules, multi-column logic that depends on three fields at once. .apply() is your escape hatch. It lets you run any Python function on every row. It’s slower than vectorized operations, so use it as a last resort — but when you need it, nothing else will do.

When Built-In Methods Aren’t Enough

.apply() runs any function on every element, row, or column of a DataFrame.

Lambda Functions: Quick One-Liners

A lambda is an anonymous function — a function without a name, written in one line:

Python

# Regular function
def double(x):
    return x * 2

# Same thing as a lambda
double = lambda x: x * 2

# Both do: double(5) -> 10

Python

employee["tax"] = employee["salary"].apply(
    lambda s: s * 0.3 if s > 100000 else s * 0.2
)
employee[["first_name", "salary", "tax"]]

.apply() Is Slower Than Vectorized Operations

.apply() loops under the hood. For simple cases, np.where() or .str methods are faster. Use .apply() when the logic is too complex for vectorized alternatives.

Multi-Branch Logic

Table: employee

id	first_name	last_name	age	sex	employee_title	department	salary	target	bonus	email	city	address	manager_id
5	Max	George	26	M	Sales	Sales	1300	200	150	Max@company.com	California	2638 Richards Avenue	1
13	Katty	Bond	56	F	Manager	Management	150000	0	300	Katty@company.com	Arizona		1
11	Richerd	Gear	57	M	Manager	Management	250000	0	300	Richerd@company.com	Alabama		1
10	Jennifer	Dion	34	F	Sales	Sales	1000	200	150	Jennifer@company.com	Alabama		13
19	George	Joe	50	M	Manager	Management	100000	0	300	George@company.com	Florida	1003 Wyatt Street	1

Tables: employee

Named Functions for Complex Logic

When the logic is too complex for a lambda, write a named function:

Python

def classify_name(name):
    if pd.isna(name):
        return "Unknown"
    elif len(name) <= 3:
        return "Short"
    elif len(name) <= 6:
        return "Medium"
    else:
        return "Long"

employee["name_class"] = employee["first_name"].apply(classify_name)

Lambda vs Named Function

Use lambda for one-line logic. Use a named function when you need multiple lines, error handling, or reusability. If your lambda has more than one if/else, switch to a named function.

.apply() on Rows

Pass axis=1 to apply a function to each row — the function receives the entire row as a Series:

Tables: employee

When NOT to Use .apply()

Before reaching for .apply(), check if a built-in method exists:

Python


# Slow: .apply() for simple case
df["upper"] = df["name"].apply(lambda x: x.upper())

# Fast: built-in .str method
df["upper"] = df["name"].str.upper()

# Slow: .apply() for conditional
df["flag"] = df["salary"].apply(
    lambda x: "High" if x > 100000 else "Low"
)

# Fast: np.where()
df["flag"] = np.where(df["salary"] > 100000, "High", "Low")

Key Takeaways

.apply(func) runs a function on every element in a Series.
.apply(func, axis=1) runs on every row of a DataFrame.
Lambda for one-liners; named functions for complex logic.
Prefer vectorized operations (.str, np.where, .dt) when they exist — .apply() is the fallback.

What’s Next

You can now transform data with any custom logic. Next: reshaping — pivoting wide tables to long and long tables to wide, the operations that power every cross-tabulation report.

Next upContinue →

Reshaping Data

30 min