Custom Logic with .apply()
Progress Tracking
Log in to save this lesson and continue from where you left off.
Built-in methods like .str handle common cases. But real data has edge cases that don’t fit built-in methods: custom tax brackets, business-specific categorization rules, multi-column logic that depends on three fields at once. .apply() is your escape hatch. It lets you run any Python function on every row. It’s slower than vectorized operations, so use it as a last resort — but when you need it, nothing else will do.
When Built-In Methods Aren’t Enough
.apply() runs any function on every element, row, or column of a DataFrame.
Lambda Functions: Quick One-Liners
A lambda is an anonymous function — a function without a name, written in one line:
# Regular function
def double(x):
return x * 2
# Same thing as a lambda
double = lambda x: x * 2
# Both do: double(5) -> 10employee["tax"] = employee["salary"].apply(
lambda s: s * 0.3 if s > 100000 else s * 0.2
)
employee[["first_name", "salary", "tax"]].apply() loops under the hood. For simple cases, np.where() or .str methods are faster. Use .apply() when the logic is too complex for vectorized alternatives.
Multi-Branch Logic
The starter has the lambda structure. Fill in the salary tiers: over 100000 = Senior, over 70000 = Mid, else Junior.
Named Functions for Complex Logic
When the logic is too complex for a lambda, write a named function:
def classify_name(name):
if pd.isna(name):
return "Unknown"
elif len(name) <= 3:
return "Short"
elif len(name) <= 6:
return "Medium"
else:
return "Long"
employee["name_class"] = employee["first_name"].apply(classify_name)Use lambda for one-line logic. Use a named function when you need multiple lines, error handling, or reusability. If your lambda has more than one if/else, switch to a named function.
.apply() on Rows
Pass axis=1 to apply a function to each row — the function receives the entire row as a Series:
Create a label column combining first name and department: 'Alice (HR)'. Use `.apply()` with axis=1.
When NOT to Use .apply()
Before reaching for .apply(), check if a built-in method exists:
# Slow: .apply() for simple case
df["upper"] = df["name"].apply(lambda x: x.upper())
# Fast: built-in .str method
df["upper"] = df["name"].str.upper()
# Slow: .apply() for conditional
df["flag"] = df["salary"].apply(
lambda x: "High" if x > 100000 else "Low"
)
# Fast: np.where()
df["flag"] = np.where(df["salary"] > 100000, "High", "Low")Key Takeaways
.apply(func)runs a function on every element in a Series..apply(func, axis=1)runs on every row of a DataFrame.- Lambda for one-liners; named functions for complex logic.
- Prefer vectorized operations (
.str,np.where,.dt) when they exist —.apply()is the fallback.
What’s Next
You can now transform data with any custom logic. Next: reshaping — pivoting wide tables to long and long tables to wide, the operations that power every cross-tabulation report.