Module 6: Windows Functions25 min

Aggregate Window Functions

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Aggregates Without GROUP BY

You already know SUM, AVG, COUNT, MIN, and MAX. They collapse rows when used with GROUP BY. But add OVER() and they become window functions: they calculate across rows without collapsing anything.

This gives you running totals, moving averages, and percentage-of-total calculations. All are incredibly useful in practice.

Running Totals in Action

SQL
SELECT
  user_id,
  CAST(created_at AS DATE) AS tx_date,
  product_id,
  COUNT(*) OVER (
    PARTITION BY user_id
    ORDER BY created_at
  ) AS running_purchase_count
FROM amazon_transactions
ORDER BY user_id, created_at;
SELECT
  user_id,
  created_at::date AS tx_date,
  product_id,
  COUNT(*) OVER (
    PARTITION BY user_id
    ORDER BY created_at
  ) AS running_purchase_count
FROM amazon_transactions
ORDER BY user_id, created_at;

Each row shows the cumulative count up to and including that row. This is useful for tracking cumulative metrics: year-to-date revenue, total signups over time, and running balance in a bank account.

Running Total by Group

Add PARTITION BY to restart the running total for each group:

PostgreSQL
SELECT
    salesperson,
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY salesperson
        ORDER BY order_date
    ) AS ytd_sales
FROM orders;

Now each salesperson has their own running total. Alice starts at 0, Bob starts at 0. They accumulate independently.

Moving Averages with AVG() OVER

A moving average smooths out noise by averaging recent values. You’ve probably seen this in stock charts.

PostgreSQL
SELECT
    date,
    daily_signups,
    AVG(daily_signups) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS seven_day_avg
FROM signups;

This averages the current day plus the previous 6 days. We’ll cover that ROWS BETWEEN syntax in the next lesson. For now, just know you can control exactly which rows go into the calculation.

COUNT() OVER: Running Counts

Count rows cumulatively:

PostgreSQL
SELECT
    signup_date,
    user_name,
    COUNT(*) OVER (ORDER BY signup_date) AS cumulative_users
FROM users;

Each row shows how many users existed at that point in time. Your first user shows 1, your hundredth shows 100.

MIN/MAX() OVER: Running Extremes

Track the minimum or maximum value seen so far:

PostgreSQL
SELECT
    date,
    stock_price,
    MAX(stock_price) OVER (ORDER BY date) AS all_time_high,
    MIN(stock_price) OVER (ORDER BY date) AS all_time_low
FROM stock_prices;

At each date, you see the highest and lowest price that had occurred up to that point. Useful for tracking records.

SQL
SELECT
  user_id,
  COUNT(*) AS user_orders,
  ROUND(
    100.0 * COUNT(*) / SUM(COUNT(*)) OVER (),
    2
  ) AS pct_of_all_orders
FROM amazon_transactions
GROUP BY user_id
ORDER BY pct_of_all_orders DESC;

The key is SUM(value) OVER () with empty parentheses. That means “sum across all rows.” Everyone gets the same grand total, but each row shows its own share.

This works for any aggregate: COUNT(*) OVER () for total rows, AVG(x) OVER () for overall average.

Combining Multiple Window Functions

You can stack as many window functions as you need:

PostgreSQL
SELECT
    month,
    revenue,
    SUM(revenue) OVER (ORDER BY month) AS ytd_revenue,
    AVG(revenue) OVER () AS avg_monthly,
    revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change,
    RANK() OVER (ORDER BY revenue DESC) AS revenue_rank
FROM monthly_sales;

Each window function operates independently. They don’t interfere with each other.

Window vs GROUP BY: The Comparison

This is worth repeating because it’s fundamental:

  • “Total revenue per region” → Use GROUP BY. Result: one row per region.
  • “Each sale + its region’s total” → Use window function. Result: one row per sale.
  • “Overall average” → Just AVG(). Result: one number.
  • “Each row + overall average” → AVG() OVER(). Result: one row per original row.

Percentage Within Groups

SQL
SELECT
  user_id,
  CAST(created_at AS DATE) AS tx_date,
  product_id,
  ROUND(
    100.0 * 1 / COUNT(*) OVER (PARTITION BY user_id),
    2
  ) AS pct_of_user_orders
FROM amazon_transactions
ORDER BY user_id, created_at;
SELECT
  user_id,
  created_at::date AS tx_date,
  product_id,
  ROUND(
    100.0 * 1 / COUNT(*) OVER (PARTITION BY user_id),
    2
  ) AS pct_of_user_orders
FROM amazon_transactions
ORDER BY user_id, created_at;

Finding User Purchases

Combine ROW_NUMBER() with aggregation to find specific purchase patterns.

Table: amazon_transactions
iduser_iditemcreated_atrevenue
1109milk2020-03-03123
2139biscuit2020-03-18421
3120milk2020-03-18176
4108banana2020-03-18862
5130milk2020-03-28333
1
Finding User Purchases
View solution

Identify returning active users by finding users who made a second purchase within 1 to 7 days after their first purchase. Ignore same-day purchases. Output a list of these `user_ids`.

Tables: amazon_transactions

Key Takeaways

  • SUM/AVG/COUNT/MIN/MAX all work as window functions with OVER()
  • ORDER BY inside OVER() creates running calculations
  • Empty OVER() means “across all rows” (grand total)
  • PARTITION BY restarts calculations for each group
  • Percentage of total = value / SUM(value) OVER ()

What’s Next

We’ve been using the default behavior for which rows get included in calculations. But what if you need exactly the last 7 days, or 3 rows on each side? That’s where window frames come in, and they’re the final piece of the puzzle.