Module 6: Window Operations•20 min

Cumulative and Running Calculations

Progress Tracking

Running Totals and Counts

Running totals add up values row by row as you move through the data. They answer questions like "how many orders has this customer placed so far?" or "what’s the cumulative revenue through March?"

Python

df = amazon_transactions.sort_values(["user_id", "created_at"])
df["running_count"] = df.groupby("user_id").cumcount() + 1
df[["user_id", "created_at", "id", "running_count"]]

.cumcount() gives a 0-based running count within each group. Add 1 for 1-based. For running sums, use .cumsum():

Python

df["running_total"] = (
    df.groupby("user_id")["amount"].cumsum()
)

Percentage of Total

"What percentage of total orders does each customer represent?" This is one of the most common business questions. The pattern is always: compute the per-entity count (or sum), compute the grand total, divide. It’s the same ratio pattern from Module 4, but now you’re applying it in a windowed context.

Calculate each row’s share of the whole:

Table: amazon_transactions

id	user_id	item	created_at	revenue
1	109	milk	2020-03-03	123
2	139	biscuit	2020-03-18	421
3	120	milk	2020-03-18	176
4	108	banana	2020-03-18	862
5	130	milk	2020-03-28	333

Tables: amazon_transactions

Moving Averages with .rolling()

Moving averages smooth out noise in time-series data. Instead of showing daily revenue (which spikes and dips), a 7-day moving average reveals the underlying trend. You’ve seen them in stock charts — the smooth line overlaid on the jagged price data. .rolling(n).mean() computes this in one line.

.rolling(n) creates a sliding window of n rows:

Python

# 7-day moving average
df["ma_7"] = df["daily_sales"].rolling(7).mean()

# Within groups
df["ma_3"] = (
    df.groupby("user_id")["amount"]
    .rolling(3).mean()
    .reset_index(level=0, drop=True)
)

.rolling() Within Groups

When using .rolling() after .groupby(), the result has a MultiIndex. Chain .reset_index(level=0, drop=True) to align it back to the original DataFrame.

Cumulative Extremes

Tables: amazon_transactions

Finding User Purchases

Table: amazon_transactions

id	user_id	item	created_at	revenue
1	109	milk	2020-03-03	123
2	139	biscuit	2020-03-18	421
3	120	milk	2020-03-18	176
4	108	banana	2020-03-18	862
5	130	milk	2020-03-28	333

Tables: amazon_transactions

Key Takeaways

.cumsum(), .cumcount(), .cummax(), .cummin() for running calculations.
.rolling(n).mean() for moving averages.
Percentage of total: value / .sum() * 100.
Always sort before cumulative operations.

What’s Next

The final lesson covers advanced patterns: combining multiple window techniques, the deduplication pattern, and real-world analysis pipelines.

Next upContinue →

Advanced Window Patterns

25 min