Aggregate Window Functions
Progress Tracking
Log in to save this lesson and continue from where you left off.
Aggregates Without GROUP BY
You already know SUM, AVG, COUNT, MIN, and MAX. They collapse rows when used with GROUP BY. But add OVER() and they become window functions: they calculate across rows without collapsing anything.
This gives you running totals, moving averages, and percentage-of-total calculations. All are incredibly useful in practice.
Running Totals in Action
SELECT
user_id,
CAST(created_at AS DATE) AS tx_date,
product_id,
COUNT(*) OVER (
PARTITION BY user_id
ORDER BY created_at
) AS running_purchase_count
FROM amazon_transactions
ORDER BY user_id, created_at;
SELECT
user_id,
created_at::date AS tx_date,
product_id,
COUNT(*) OVER (
PARTITION BY user_id
ORDER BY created_at
) AS running_purchase_count
FROM amazon_transactions
ORDER BY user_id, created_at;Each row shows the cumulative count up to and including that row. This is useful for tracking cumulative metrics: year-to-date revenue, total signups over time, and running balance in a bank account.
Running Total by Group
Add PARTITION BY to restart the running total for each group:
SELECT
salesperson,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY salesperson
ORDER BY order_date
) AS ytd_sales
FROM orders;Now each salesperson has their own running total. Alice starts at 0, Bob starts at 0. They accumulate independently.
Moving Averages with AVG() OVER
A moving average smooths out noise by averaging recent values. You’ve probably seen this in stock charts.
SELECT
date,
daily_signups,
AVG(daily_signups) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS seven_day_avg
FROM signups;This averages the current day plus the previous 6 days. We’ll cover that ROWS BETWEEN syntax in the next lesson. For now, just know you can control exactly which rows go into the calculation.
COUNT() OVER: Running Counts
Count rows cumulatively:
SELECT
signup_date,
user_name,
COUNT(*) OVER (ORDER BY signup_date) AS cumulative_users
FROM users;Each row shows how many users existed at that point in time. Your first user shows 1, your hundredth shows 100.
MIN/MAX() OVER: Running Extremes
Track the minimum or maximum value seen so far:
SELECT
date,
stock_price,
MAX(stock_price) OVER (ORDER BY date) AS all_time_high,
MIN(stock_price) OVER (ORDER BY date) AS all_time_low
FROM stock_prices;At each date, you see the highest and lowest price that had occurred up to that point. Useful for tracking records.
SELECT
user_id,
COUNT(*) AS user_orders,
ROUND(
100.0 * COUNT(*) / SUM(COUNT(*)) OVER (),
2
) AS pct_of_all_orders
FROM amazon_transactions
GROUP BY user_id
ORDER BY pct_of_all_orders DESC;The key is SUM(value) OVER () with empty parentheses. That means “sum across all rows.” Everyone gets the same grand total, but each row shows its own share.
This works for any aggregate: COUNT(*) OVER () for total rows, AVG(x) OVER () for overall average.
Combining Multiple Window Functions
You can stack as many window functions as you need:
SELECT
month,
revenue,
SUM(revenue) OVER (ORDER BY month) AS ytd_revenue,
AVG(revenue) OVER () AS avg_monthly,
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change,
RANK() OVER (ORDER BY revenue DESC) AS revenue_rank
FROM monthly_sales;Each window function operates independently. They don’t interfere with each other.
Window vs GROUP BY: The Comparison
This is worth repeating because it’s fundamental:
- “Total revenue per region” → Use
GROUP BY. Result: one row per region. - “Each sale + its region’s total” → Use
window function. Result: one row per sale. - “Overall average” → Just
AVG(). Result: one number. - “Each row + overall average” →
AVG() OVER(). Result: one row per original row.
Percentage Within Groups
SELECT
user_id,
CAST(created_at AS DATE) AS tx_date,
product_id,
ROUND(
100.0 * 1 / COUNT(*) OVER (PARTITION BY user_id),
2
) AS pct_of_user_orders
FROM amazon_transactions
ORDER BY user_id, created_at;
SELECT
user_id,
created_at::date AS tx_date,
product_id,
ROUND(
100.0 * 1 / COUNT(*) OVER (PARTITION BY user_id),
2
) AS pct_of_user_orders
FROM amazon_transactions
ORDER BY user_id, created_at;Finding User Purchases
Combine ROW_NUMBER() with aggregation to find specific purchase patterns.
| id | user_id | item | created_at | revenue |
|---|---|---|---|---|
| 1 | 109 | milk | 2020-03-03 | 123 |
| 2 | 139 | biscuit | 2020-03-18 | 421 |
| 3 | 120 | milk | 2020-03-18 | 176 |
| 4 | 108 | banana | 2020-03-18 | 862 |
| 5 | 130 | milk | 2020-03-28 | 333 |
Identify returning active users by finding users who made a second purchase within 1 to 7 days after their first purchase. Ignore same-day purchases. Output a list of these `user_ids`.
Key Takeaways
SUM/AVG/COUNT/MIN/MAXall work as window functions withOVER()ORDER BYinsideOVER()creates running calculations- Empty
OVER()means “across all rows” (grand total) PARTITION BYrestarts calculations for each groupPercentage of total = value / SUM(value) OVER ()
What’s Next
We’ve been using the default behavior for which rows get included in calculations. But what if you need exactly the last 7 days, or 3 rows on each side? That’s where window frames come in, and they’re the final piece of the puzzle.