Module 6: Window Operations20 min

Cumulative and Running Calculations

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Running Totals and Counts

Running totals add up values row by row as you move through the data. They answer questions like "how many orders has this customer placed so far?" or "what’s the cumulative revenue through March?"

Python
df = amazon_transactions.sort_values(["user_id", "created_at"])
df["running_count"] = df.groupby("user_id").cumcount() + 1
df[["user_id", "created_at", "id", "running_count"]]

.cumcount() gives a 0-based running count within each group. Add 1 for 1-based. For running sums, use .cumsum():

Python
df["running_total"] = (
    df.groupby("user_id")["amount"].cumsum()
)

Percentage of Total

"What percentage of total orders does each customer represent?" This is one of the most common business questions. The pattern is always: compute the per-entity count (or sum), compute the grand total, divide. It’s the same ratio pattern from Module 4, but now you’re applying it in a windowed context.

Calculate each row’s share of the whole:

1
Calculate Percentage of Total

The starter counts orders per user. Add a column showing each user’s share of all orders.

Tables: amazon_transactions

Moving Averages with .rolling()

Moving averages smooth out noise in time-series data. Instead of showing daily revenue (which spikes and dips), a 7-day moving average reveals the underlying trend. You’ve seen them in stock charts — the smooth line overlaid on the jagged price data. .rolling(n).mean() computes this in one line.

.rolling(n) creates a sliding window of n rows:

Python
# 7-day moving average
df["ma_7"] = df["daily_sales"].rolling(7).mean()

# Within groups
df["ma_3"] = (
    df.groupby("user_id")["amount"]
    .rolling(3).mean()
    .reset_index(level=0, drop=True)
)
.rolling() Within Groups

When using .rolling() after .groupby(), the result has a MultiIndex. Chain .reset_index(level=0, drop=True) to align it back to the original DataFrame.

Cumulative Extremes

2
Running Max Per User

For each user, calculate the running maximum id seen so far (sorted by date).

Tables: amazon_transactions

Finding User Purchases

Table: amazon_transactions
iduser_iditemcreated_atrevenue
1109milk2020-03-03123
2139biscuit2020-03-18421
3120milk2020-03-18176
4108banana2020-03-18862
5130milk2020-03-28333
3
Finding User Purchases
View solution

Identify returning active users by finding users who made a second purchase within 1 to 7 days after their first purchase. Ignore same-day purchases. Output a list of these `user_ids`.

Tables: amazon_transactions

Key Takeaways

  • .cumsum(), .cumcount(), .cummax(), .cummin() for running calculations.
  • .rolling(n).mean() for moving averages.
  • Percentage of total: value / .sum() * 100.
  • Always sort before cumulative operations.

What’s Next

The final lesson covers advanced patterns: combining multiple window techniques, the deduplication pattern, and real-world analysis pipelines.