NULL Handling in Aggregations
Progress Tracking
Log in to save this lesson and continue from where you left off.
The Silent Data Killer
Here’s a scenario that trips up even experienced analysts. You’re counting customers, and you get 1,000. Your colleague counts the same table and gets 847. You’re both right. How?
The difference is NULL. One of you wrote COUNT(*), the other wrote COUNT(email). And that difference matters enormously when your data has gaps.
NULLs are sneaky. Most aggregate functions pretend NULL values don’t exist. If you’re not aware of this, you can get very confused.
How Each Function Handles NULLs
Let’s say you have a column with values 100, 200, and NULL:
SELECT
SUM(salary), -- Returns 300 (ignores NULL)
AVG(salary), -- Returns 150 (300/2, not 300/3)
MIN(salary), -- Returns 100
MAX(salary) -- Returns 200
FROM techcorp_workforce;Notice that AVG divided by 2, not 3. It only counts non-NULL values. This is usually what you want, but if you’re not aware of it, you can get confused.
COUNT: The Exception That Matters
COUNT behaves differently depending on what you’re counting:
SELECT
COUNT(*) AS total_rows, -- Counts all rows
COUNT(email) AS has_email, -- Excludes NULLs
COUNT(DISTINCT email) AS unique_emails
FROM techcorp_workforce;Same table, three different numbers. 1,000 users total, but only 847 have emails. This is incredibly common in real data.
This COUNT(*) vs COUNT(column) difference is a common interview topic. Make sure you understand it.
What Happens with Zero Rows?
Here’s an edge case that causes production bugs: what if your WHERE clause matches no rows?
SELECT
COUNT(*),
SUM(salary),
AVG(salary)
FROM techcorp_workforce
WHERE department = 'NonExistent';COUNT(*) returns 0, but SUM and AVG return NULL, not 0. If your application code doesn’t expect NULL, you’ll get errors.
There’s a related danger: if COUNT(*) returns 0 and you try to divide by it, your query crashes. Let’s fix that.
Safe Division with NULLIF
Dividing by zero crashes your query. This happens more often than you’d think, especially with filtered data or edge cases.
NULLIF is the fix. It takes two arguments and returns NULL if they're equal:
NULLIF(x, 0) -- Returns NULL if x equals 0, otherwise returns xWhen you divide by NULLIF(count, 0), a zero count becomes NULL, and dividing by NULL returns NULL instead of crashing.
| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
This query calculates the phone ratio for a department. It works fine for existing departments — but crashes when the department has no rows. Wrap the denominator in NULLIF to make it safe.
Calculating Ratios: The Professional Approach
A common task is calculating the fraction of your data that meets certain criteria. How many users have completed their profiles? What portion of orders shipped on time?
Professional data teams express these as ratios between 0 and 1, not percentages.
Calculate the ratio of employees who have a phone number using CAST for decimal division and NULLIF for safety.
CAST converts the integer count to a decimal value, enabling decimal division. Without it, SQL performs integer division, so 3/4 equals 0, not 0.75.
Why Ratios Beat Percentages
- Ratio (0 to 1): easier for downstream calculations, standard in metrics (CTR, CVR, retention), and what SQL naturally returns
- Percentage (0 to 100): requires dividing by 100 to reuse, is a display format for stakeholders, and adds an extra multiplication step
Store and calculate with ratios. Convert to percentages only at the final display step. This keeps your metrics clean and composable.
If You Need Percentages
Sometimes your stakeholder specifically wants to see 85% instead of 0.85. Just multiply at the end:
SELECT
ROUND(
CAST(COUNT(phone_number) AS DECIMAL) / NULLIF(COUNT(*), 0) * 100,
1
) AS phone_pct
FROM techcorp_workforce;
-- Returns: 84.7 instead of 0.85Contact Information Completeness
Calculate the ratio of records with non-NULL contact information. Return a value between 0 and 1.
| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
The data quality team is auditing employee records to assess the completeness of contact information. Calculate and return the ratio of employees who have a `NULL` phone number.
COALESCE: Replacing NULLs
Sometimes you want to treat NULL as a specific value. COALESCE returns the first non-NULL argument:
COALESCE(value, fallback) -- Returns value if not NULL, otherwise fallbackCalculate the average salary two ways: once normally (NULLs excluded) and once treating NULLs as 0 using COALESCE.
Whether treating NULL as 0 makes sense depends on your business context. Don’t do it automatically.
Be careful with COALESCE in aggregates. Turning NULL into 0 changes the meaning of your data. Only do it when it genuinely makes business sense.
NULLs in GROUP BY
Here’s something useful: when you GROUP BY a column with NULLs, all the NULLs get grouped together:
SELECT
cust_id,
COUNT(*) AS order_count
FROM orders
GROUP BY cust_id;
-- If cust_id has NULLs, they get their own groupThis is actually handy. You can see at a glance how many records have missing department data.
| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Group employees by department and count them. Notice how NULL departments form their own group.
Key Takeaways
SUM,AVG,MIN,MAXall ignoreNULLvaluesAVGdivides by the count of non-NULLvalues, not total rowsCOUNT(*)includes NULLs;COUNT(column)excludes them- Aggregates on empty results return
NULL(exceptCOUNT(*)which returns 0) - Use
NULLIF(denominator, 0)to prevent division errors - Use
CAST(x AS DECIMAL)to enable decimal division - Calculate ratios (0 to 1), not percentages. Convert only for display.
- Use
COALESCEto substitute values for NULLs, but only when appropriate NULLvalues form their own group inGROUP BY
What’s Next
Sometimes you need more sophisticated aggregation: counting only certain rows, summing values that meet a condition, or categorizing data on the fly. That’s where CASE WHEN combined with aggregates becomes powerful.