Cumulative and Running Calculations
Progress Tracking
Log in to save this lesson and continue from where you left off.
Running Totals and Counts
Running totals add up values row by row as you move through the data. They answer questions like "how many orders has this customer placed so far?" or "what’s the cumulative revenue through March?"
df = amazon_transactions.sort_values(["user_id", "created_at"])
df["running_count"] = df.groupby("user_id").cumcount() + 1
df[["user_id", "created_at", "id", "running_count"]].cumcount() gives a 0-based running count within each group. Add 1 for 1-based. For running sums, use .cumsum():
df["running_total"] = (
df.groupby("user_id")["amount"].cumsum()
)Percentage of Total
"What percentage of total orders does each customer represent?" This is one of the most common business questions. The pattern is always: compute the per-entity count (or sum), compute the grand total, divide. It’s the same ratio pattern from Module 4, but now you’re applying it in a windowed context.
Calculate each row’s share of the whole:
The starter counts orders per user. Add a column showing each user’s share of all orders.
Moving Averages with .rolling()
Moving averages smooth out noise in time-series data. Instead of showing daily revenue (which spikes and dips), a 7-day moving average reveals the underlying trend. You’ve seen them in stock charts — the smooth line overlaid on the jagged price data. .rolling(n).mean() computes this in one line.
.rolling(n) creates a sliding window of n rows:
# 7-day moving average
df["ma_7"] = df["daily_sales"].rolling(7).mean()
# Within groups
df["ma_3"] = (
df.groupby("user_id")["amount"]
.rolling(3).mean()
.reset_index(level=0, drop=True)
)When using .rolling() after .groupby(), the result has a MultiIndex. Chain .reset_index(level=0, drop=True) to align it back to the original DataFrame.
Cumulative Extremes
For each user, calculate the running maximum id seen so far (sorted by date).
Finding User Purchases
| id | user_id | item | created_at | revenue |
|---|---|---|---|---|
| 1 | 109 | milk | 2020-03-03 | 123 |
| 2 | 139 | biscuit | 2020-03-18 | 421 |
| 3 | 120 | milk | 2020-03-18 | 176 |
| 4 | 108 | banana | 2020-03-18 | 862 |
| 5 | 130 | milk | 2020-03-28 | 333 |
Identify returning active users by finding users who made a second purchase within 1 to 7 days after their first purchase. Ignore same-day purchases. Output a list of these `user_ids`.
Key Takeaways
.cumsum(),.cumcount(),.cummax(),.cummin()for running calculations..rolling(n).mean()for moving averages.- Percentage of total: value /
.sum()* 100. - Always sort before cumulative operations.
What’s Next
The final lesson covers advanced patterns: combining multiple window techniques, the deduplication pattern, and real-world analysis pipelines.