Module 2: Aggregating & Grouping Data25 min

NULL Handling in Aggregations

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

The Silent Data Killer

Here’s a scenario that trips up even experienced analysts. You’re counting customers, and you get 1,000. Your colleague counts the same table and gets 847. You’re both right. How?

The difference is NULL. One of you wrote COUNT(*), the other wrote COUNT(email). And that difference matters enormously when your data has gaps.

NULLs are sneaky. Most aggregate functions pretend NULL values don’t exist. If you’re not aware of this, you can get very confused.

How Each Function Handles NULLs

Let’s say you have a column with values 100, 200, and NULL:

SQL
SELECT
  SUM(salary), -- Returns 300 (ignores NULL)
  AVG(salary), -- Returns 150 (300/2, not 300/3)
  MIN(salary), -- Returns 100
  MAX(salary) -- Returns 200
FROM techcorp_workforce;

Notice that AVG divided by 2, not 3. It only counts non-NULL values. This is usually what you want, but if you’re not aware of it, you can get confused.

COUNT: The Exception That Matters

COUNT behaves differently depending on what you’re counting:

SQL
SELECT
  COUNT(*) AS total_rows, -- Counts all rows
  COUNT(email) AS has_email, -- Excludes NULLs
  COUNT(DISTINCT email) AS unique_emails
FROM techcorp_workforce;

Same table, three different numbers. 1,000 users total, but only 847 have emails. This is incredibly common in real data.

Interview favorite

This COUNT(*) vs COUNT(column) difference is a common interview topic. Make sure you understand it.

What Happens with Zero Rows?

Here’s an edge case that causes production bugs: what if your WHERE clause matches no rows?

SQL
SELECT
  COUNT(*),
  SUM(salary),
  AVG(salary)
FROM techcorp_workforce
WHERE department = 'NonExistent';

COUNT(*) returns 0, but SUM and AVG return NULL, not 0. If your application code doesn’t expect NULL, you’ll get errors.

There’s a related danger: if COUNT(*) returns 0 and you try to divide by it, your query crashes. Let’s fix that.

Safe Division with NULLIF

Dividing by zero crashes your query. This happens more often than you’d think, especially with filtered data or edge cases.

NULLIF is the fix. It takes two arguments and returns NULL if they're equal:

SQL
NULLIF(x, 0)  -- Returns NULL if x equals 0, otherwise returns x

When you divide by NULLIF(count, 0), a zero count becomes NULL, and dividing by NULL returns NULL instead of crashing.

Table: techcorp_workforce
idfirst_namelast_namedepartmentsalaryphone_numberjoining_date
1SarahMitchellHR95000555-01012021-03-15
2MichaelChenHR88000555-01022022-06-01
3EmilyRodriguezHR825002021-09-20
4DavidParkHR80000555-01042023-01-10
5LisaThompsonHR650002021-04-05
1
Safe Division with NULLIF

This query calculates the phone ratio for a department. It works fine for existing departments — but crashes when the department has no rows. Wrap the denominator in NULLIF to make it safe.

Tables: techcorp_workforce

Calculating Ratios: The Professional Approach

A common task is calculating the fraction of your data that meets certain criteria. How many users have completed their profiles? What portion of orders shipped on time?

Professional data teams express these as ratios between 0 and 1, not percentages.

2
Calculate a Phone Ratio

Calculate the ratio of employees who have a phone number using CAST for decimal division and NULLIF for safety.

Tables: techcorp_workforce

CAST converts the integer count to a decimal value, enabling decimal division. Without it, SQL performs integer division, so 3/4 equals 0, not 0.75.

Why Ratios Beat Percentages

  • Ratio (0 to 1): easier for downstream calculations, standard in metrics (CTR, CVR, retention), and what SQL naturally returns
  • Percentage (0 to 100): requires dividing by 100 to reuse, is a display format for stakeholders, and adds an extra multiplication step
Ratios first, percentages last

Store and calculate with ratios. Convert to percentages only at the final display step. This keeps your metrics clean and composable.

If You Need Percentages

Sometimes your stakeholder specifically wants to see 85% instead of 0.85. Just multiply at the end:

SQL
SELECT
  ROUND(
  CAST(COUNT(phone_number) AS DECIMAL) / NULLIF(COUNT(*), 0) * 100,
  1
  ) AS phone_pct
FROM techcorp_workforce;
-- Returns: 84.7 instead of 0.85

Contact Information Completeness

Calculate the ratio of records with non-NULL contact information. Return a value between 0 and 1.

Table: techcorp_workforce
idfirst_namelast_namedepartmentsalaryphone_numberjoining_date
1SarahMitchellHR95000555-01012021-03-15
2MichaelChenHR88000555-01022022-06-01
3EmilyRodriguezHR825002021-09-20
4DavidParkHR80000555-01042023-01-10
5LisaThompsonHR650002021-04-05
3
Contact Information Completeness
View solution

The data quality team is auditing employee records to assess the completeness of contact information. Calculate and return the ratio of employees who have a `NULL` phone number.

Tables: techcorp_workforce

COALESCE: Replacing NULLs

Sometimes you want to treat NULL as a specific value. COALESCE returns the first non-NULL argument:

SQL
COALESCE(value, fallback) -- Returns value if not NULL, otherwise fallback
4
Try COALESCE with AVG

Calculate the average salary two ways: once normally (NULLs excluded) and once treating NULLs as 0 using COALESCE.

Tables: techcorp_workforce

Whether treating NULL as 0 makes sense depends on your business context. Don’t do it automatically.

COALESCE changes meaning

Be careful with COALESCE in aggregates. Turning NULL into 0 changes the meaning of your data. Only do it when it genuinely makes business sense.

NULLs in GROUP BY

Here’s something useful: when you GROUP BY a column with NULLs, all the NULLs get grouped together:

SQL
SELECT
  cust_id,
  COUNT(*) AS order_count
FROM orders
GROUP BY cust_id;
-- If cust_id has NULLs, they get their own group

This is actually handy. You can see at a glance how many records have missing department data.

Table: techcorp_workforce
idfirst_namelast_namedepartmentsalaryphone_numberjoining_date
1SarahMitchellHR95000555-01012021-03-15
2MichaelChenHR88000555-01022022-06-01
3EmilyRodriguezHR825002021-09-20
4DavidParkHR80000555-01042023-01-10
5LisaThompsonHR650002021-04-05
5
Try GROUP BY with NULLs

Group employees by department and count them. Notice how NULL departments form their own group.

Tables: techcorp_workforce

Key Takeaways

  • SUM, AVG, MIN, MAX all ignore NULL values
  • AVG divides by the count of non-NULL values, not total rows
  • COUNT(*) includes NULLs; COUNT(column) excludes them
  • Aggregates on empty results return NULL (except COUNT(*) which returns 0)
  • Use NULLIF(denominator, 0) to prevent division errors
  • Use CAST(x AS DECIMAL) to enable decimal division
  • Calculate ratios (0 to 1), not percentages. Convert only for display.
  • Use COALESCE to substitute values for NULLs, but only when appropriate
  • NULL values form their own group in GROUP BY

What’s Next

Sometimes you need more sophisticated aggregation: counting only certain rows, summing values that meet a condition, or categorizing data on the fly. That’s where CASE WHEN combined with aggregates becomes powerful.