Module 6: Windows Functions35 min

LAG and LEAD

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Looking Backward and Forward

  • “What’s the month-over-month change?” 
  • “How does this compare to last period?” 
  • “Did the customer’s spending go up or down?”

These questions require comparing a row to the row before it (or after it). Without window functions, you’d need to self-join the table, matching each row to its predecessor. It works, but it’s clunky. LAG and LEAD make it clean.

LAG in Action

SQL
SELECT
  user_id,
  CAST(created_at AS DATE) AS tx_date,
  LAG(CAST(created_at AS DATE)) OVER (ORDER BY created_at) AS prev_date
FROM amazon_transactions
ORDER BY created_at
FETCH FIRST 15 ROWS ONLY;
SELECT
  user_id,
  created_at::date AS tx_date,
  LAG(created_at::date) OVER (ORDER BY created_at) AS prev_date
FROM amazon_transactions
ORDER BY created_at
LIMIT 15;
PostgreSQL
SELECT
  user_id,
  created_at::date AS tx_date,
  LAG(created_at::date) OVER (ORDER BY created_at) AS prev_date
FROM amazon_transactions
ORDER BY created_at
LIMIT 15;

The first row has NULL for prev_date because there’s no row before it. That’s expected, and you’ll need to handle it in your calculations.

The Full LAG Syntax

PostgreSQL
LAG(column, offset, default) OVER (ORDER BY …) 
-- column: what value to grab
-- offset: how many rows back (default is 1)
-- default: what to return if there’s no previous row (default is NULL)

So LAG(revenue) is shorthand for LAG(revenue, 1, NULL). Looking back 1 row, returning NULL if there’s nothing there.

PostgreSQL
LAG(revenue)         – Previous row’s revenue, NULL if none
LAG(revenue, 1)      – Same as above
LAG(revenue, 3)      – 3 rows back
LAG(revenue, 1, 0)   – Previous row, default to 0 instead of NULL

LEAD: Getting the Next Row’s Value

LEAD() is the mirror image of LAG. It looks forward instead of backward.

PostgreSQL
SELECT
    month,
    revenue,
    LEAD(revenue) OVER (ORDER BY month) AS next_month
FROM monthly_sales;
  • month Jan, revenue 100, next_month 150
  • month Feb, revenue 150, next_month 200
  • month Mar, revenue 200, next_month 180
  • month Apr, revenue 180, next_month NULL

Now April has NULL because there’s no row after it.

LEAD is less common than LAG in practice. Most business questions are backward-looking: “compared to last month” rather than “compared to next month.” But it’s there when you need it.

Calculating Period-Over-Period Change

This is the bread and butter of LAG. Once you have the previous value, you can calculate change.

Absolute Change

PostgreSQL
SELECT
    month,
    revenue,
    revenue - LAG(revenue) OVER (ORDER BY month) AS change
FROM monthly_sales;

Period-Over-Period Change

SQL
WITH user_purchases AS (
  SELECT
    user_id,
    CAST(created_at AS DATE) AS tx_date,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS purchase_num
  FROM amazon_transactions
)
SELECT
  user_id,
  tx_date,
  purchase_num,
  LAG(tx_date) OVER (PARTITION BY user_id ORDER BY tx_date) AS prev_date,
  tx_date - LAG(tx_date) OVER (PARTITION BY user_id ORDER BY tx_date) AS days_between
FROM user_purchases
ORDER BY user_id, tx_date;
WITH user_purchases AS (
  SELECT
    user_id,
    created_at::date AS tx_date,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS purchase_num
  FROM amazon_transactions
)
PostgreSQL
WITH user_purchases AS (
  SELECT
    user_id,
    created_at::date AS tx_date,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) AS purchase_num
  FROM amazon_transactions
)
SELECT
  user_id,
  tx_date,
  purchase_num,
  LAG(tx_date) OVER (PARTITION BY user_id ORDER BY tx_date) AS prev_date,
  tx_date - LAG(tx_date) OVER (PARTITION BY user_id ORDER BY tx_date) AS days_between
FROM user_purchases
ORDER BY user_id, tx_date;
Plan for the first NULL

That first NULL is going to show up in your output. Decide upfront whether to filter it out, replace it with 0, or leave it as NULL.

LAG with PARTITION BY

Add PARTITION BY when you need to compare within groups:

PostgreSQL
SELECT
    region,
    month,
    revenue,
    revenue - LAG(revenue) OVER (
        PARTITION BY region
        ORDER BY month
    ) AS mom_change
FROM regional_sales;

Now each region has its own LAG sequence. East compares to East. West compares to West. You won’t accidentally compare East’s March to West’s February.

Adding Days Between Purchases

SQL
SELECT
  user_id,
  CAST(created_at AS DATE) AS tx_date,
  LAG(CAST(created_at AS DATE)) OVER (
    PARTITION BY user_id
    ORDER BY created_at
  ) AS prev_tx_date
FROM amazon_transactions
ORDER BY user_id, created_at;
SELECT
  user_id,
  created_at::date AS tx_date,
  LAG(created_at::date) OVER (
    PARTITION BY user_id
    ORDER BY created_at
  ) AS prev_tx_date
FROM amazon_transactions
ORDER BY user_id, created_at;

Looking Further Back

The offset parameter lets you compare to any previous period:

PostgreSQL
-- Same month last year
LAG(revenue, 12) OVER (ORDER BY month) AS same_month_ly

-- Same quarter last year
LAG(revenue, 4) OVER (ORDER BY quarter) AS same_quarter_ly

Year-over-year comparisons are more meaningful than month-over-month for seasonal businesses. A swimsuit company’s March-to-February comparison doesn’t tell you much, but March-to-March does.

Handling NULLs

The first row (or first N rows if offset > 1) will have NULL. Here are your options:

PostgreSQL
-- Option 1: Filter them out
WHERE LAG(revenue) OVER (ORDER BY month) IS NOT NULL

-- Option 2: Use a default value
LAG(revenue, 1, 0) OVER (ORDER BY month)

-- Option 3: COALESCE for calculated fields
COALESCE(revenue - LAG(revenue) OVER (…), 0) AS change
Watch for division errors

If LAG returns 0 or NULL, you'll get a division error. Use NULLIF to handle divide-by-zero: revenue / NULLIF(LAG(revenue) OVER (...), 0).

Finding Purchases

Use LAG() with PARTITION BY to calculate time between purchases for each user.

Table: amazon_transactions
iduser_iditemcreated_atrevenue
1109milk2020-03-03123
2139biscuit2020-03-18421
3120milk2020-03-18176
4108banana2020-03-18862
5130milk2020-03-28333
1
Finding Purchases
View solution

Identify returning active users by finding users who made a repeat purchase within 7 days or less of their previous transaction, excluding same-day purchases. Output a list of these `user_id`.

Tables: amazon_transactions

Calculate month-over-month percentage change. Watch out for that first NULL row.

Table: sf_transactions
idcreated_atvaluepurchase_id
12019-01-0117269243
22019-01-0517719436
32019-01-0910951330
42019-01-1316491130
52019-01-1719887239
2
Monthly Percentage Difference
View solution

Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year. The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month's revenue - last month's revenue) / last month's revenue)*100.

Tables: sf_transactions

Key Takeaways

  • LAG() gets the previous row’s value; LEAD() gets the next row’s
  • The offset parameter (default 1) controls how far back/forward to look
  • The default parameter (default NULL) controls what to return at edges
  • PARTITION BY creates separate sequences for each group
  • Handle NULLs explicitly. Don’t let them surprise you.

What’s Next

You’ve learned ranking and offset functions. Now let’s see how regular aggregates like SUM and AVG work as window functions. Running totals and moving averages are incredibly useful, and the syntax is simpler than you might think.