SQL UNION vs UNION ALL: Differences You Need to Know

sql union vs union all

Categories:

  • Author Avatar
    Written by:

    Irakli Tchigladze

In this article, we’ll discuss differences and similarities between UNION vs UNION ALL, possible use cases, and compare the performance of two set operators.

There are various ways to combine data from different sources in SQL. JOINs combine data horizontally by increasing the number of columns.

Set operators like UNION and UNION ALL allow you to combine datasets vertically. They don’t change the number of columns but stack records from multiple datasets in one table.

In this article, we will explain the differences between two set operators - UNION vs UNION ALL. We’ll go over their syntax and show how to best use these SQL features. Finally, we’ll walk you through answers to actual interview questions that involve UNION and UNION ALL.

We will also refresh your memory on two set operators. The set operators section in our SQL Cheat Sheet is quite informative as well.

What is UNION?

UNION is a set operator. It is used to vertically combine multiple datasets by stacking them on top of one another.

One important feature of UNION is that it removes duplicate rows from the combined data. If you want to ensure that combined datasets don't have duplicate records, use UNION.

UNION removes duplicate rows, not duplicate values. If a row has 5 columns, all 5 values must be the same for UNION to remove that row.

What is UNION ALL?

It is a set operator that combines two datasets by stacking them on top of one another without checking if each record is unique or not.

UNION ALL is useful when:

  1. You’re sure that datasets you are combining don’t have duplicate rows
  2. Keeping duplicate records is necessary for the task at hand.

As a rule of thumb, UNION ALL is useful for creating copies of a certain dataset.

Later in this “UNION vs UNION ALL” guide, we’ll walk you through two answers to interview questions. One of them asks you to return two copies of a certain dataset, so it’s a good question to demonstrate the usefulness of UNION ALL.

UNION vs UNION ALL in SQL - Differences and Similarities

The image below gives an overview of the differences and similarities between UNION and UNION ALL.

Differences and Similarities Between UNION vs UNION ALL in SQL

Let’s now talk a bit more about each difference and similarity.

Differences

Handling duplicates: The main difference between UNION vs UNION ALL is their approach to duplicate rows. After UNION combines datasets, it also checks for duplicate records and removes them. UNION ALL only combines them and stops there.

Performance: Because UNION takes the extra step of removing duplicate values, generally, it is considered slower than UNION ALL, but that’s not always the case. We’ll compare their performance in a later section.

Result size: As UNION ALL keeps the duplicates, it typically produces larger datasets than UNION.

NULL handling: The UNION operator treats NULLs as duplicates, meaning it removes them if all column values (including NULL) match. On the other hand, UNION ALL keeps all rows with NULLs, even if they appear identical.

Similarities

Syntax: UNION and UNION ALL basically have the identical syntax. The only difference is that you write the keyword UNION in the first case. In the second case, the keyword is – well, obviously – UNION ALL.

Column requirements: UNION and UNION ALL operations also follow the same rules - for datasets to be combined, they must have the same type, number, and order of columns.

Vertical stacking: The previous similarity is also because of the way UNION and UNION ALL combine. They both do it by vertically stacking datasets on top of one another. Unlike JOINs, merging two datasets doesn’t increase the number of columns, but the number of rows.

ORDER BY support: If you use ORDER BY in the query with either UNION or UNION ALL, it must be placed at the end of the combined query, and it sorts the final merged result set.

SQL UNION vs UNION ALL: Syntax

Let’s go over the syntax rules for UNION and UNION ALL.

Basics

Minimum number of datasets is two, with a set operator (UNION or UNION ALL) placed between them.

SELECT *
FROM sales
UNION ALL
SELECT *
FROM orders;

Note that both datasets must SELECT the same number, type, and order of columns, or it will throw an SQL error.

Set operators to combine identical datasets

In the previous example, we SELECT data from two different tables. Datasets can be identical as well.

In this example, we combine two identical SELECT statements.

SELECT * 
FROM sales
UNION ALL
SELECT * 
FROM sales;

As a result, we’ll have two copies of the same data from one table.

Set operators to combine two complex queries

Datasets you combine can be as simple as a pair of SELECT/FROM statements or very complex. It’s okay to filter, group, or aggregate data from the source table, as long as two datasets have the right number, type, and order of columns.

Let’s look at an example where we combine two filtered datasets with a WHERE clause.

SELECT * 
FROM domestic_sales
WHERE total_revenue > 100
UNION ALL
SELECT * 
FROM international_sales
WHERE total_revenue > 300;

Column names

Two datasets don’t need to have the same column names to be combined.

As long as it’s the correct number, type, and logical order of columns, names from the first query don’t have to match those of the second one.

SELECT total_revenue, 
       units_sold 
FROM domestic_sales
UNION ALL
SELECT aggregate_revenue, 
       pcs_ordered 
FROM international_sales;


As you can see, both datasets have the same number of columns.

For two datasets to be successfully combined, total_revenue and aggregate_revenue columns need to have the same type (or implicitly convertible) values. The same applies to units_sold and pcs_ordered columns.

The combined dataset will have column names of the first query - total_revenue and units_sold.

You can specify column names for the final table using the AS keyword.

SELECT total_revenue AS total, 
       units_sold AS sold 
FROM domestic_sales
UNION ALL
SELECT aggregate_revenue, 
       pcs_ordered 
FROM international_sales;


Logical order of columns

When using UNION to combine datasets vertically, mixing up the logical order of columns can skew final data.

For example, let’s imagine you have three columns, labeled: thousands, hundreds, and tens. Each record is supposed to describe a number like 1560, where thousands is 1, hundreds is 5, and tens is 6.

Let’s look at this example:

SELECT thousands, 
       hundreds,
       tens
FROM table
UNION ALL
SELECT tens,
       thousands,
       hundreds 
FROM table;

In this case, UNION itself won’t throw an error because you have three columns of integer type.

However, tens values from the second dataset will be aligned with thousands values from the first. Other columns will be misaligned as well, which can seriously disrupt accuracy of your data.

SQL UNION vs UNION ALL: In Practice

SQL UNION vs UNION ALL Examples

In this “UNION vs UNION ALL” guide, we’ll continue the StrataScratch blog tradition of breaking down SQL Interview Questions to help you understand various SQL features.

Let’s review two interview questions where UNION and UNION ALL play an integral role in finding a solution.

UNION question: Wine Variety Revenues

In this interview question from Wine Magazine, candidates have to work with sales data to find total sales for each region.

Last Updated: February 2020

MediumID 10033

You have a dataset containing information about wines, including their regions, varieties, and prices. Some wines have two regions listed (region_1 and region_2). Calculate the total cost of wines for each combination of region and variety, using both region columns.

Because there are two regions listed, first combine the data from both region columns, making sure to remove any duplicate entries and excluding any records with missing prices or regions. Then, sum the prices for each unique combination of region and variety and present the results, showing the region, variety, and total price. Order the final result from the highest total price to the lowest.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/10033-wine-variety-revenues

Understand the question

This question can be confusing, especially after looking at the table.

The important takeaway is that we need to aggregate wine revenue for each region. Every record in the winemag_p1 table has two region columns.

We’ll need to create two datasets, each consisting of region, price and variety values. We will be aggregating total revenue for each region, so we need to avoid counting the same sales record more than once. We need to find and remove duplicate records in the table. This is why we need to use UNION instead of UNION ALL.

Analyze data

To solve this question, candidates need to work with the winemag_p1 table.

This table contains information about wine sales in two regions. We’ll need to separate records so that there is just one region value in each record.

Table: winemag_p1
Loading Dataset

The table has many columns. At this point, we can only make assumptions:

  • id identifies each sale
  • country, region, province, winery values probably represent the origin of the wine.
  • description, designation columns give us broad information about the wine.
  • points values are probably used for wine ranking at Wine Magazine
  • price most likely represents the price of each unit. Most likely, we’ll have to aggregate values in this column to get the total revenue per region.
  • region_1 and region_2 specify the region of the wine. We’ll have to generate datasets with the region, price, variety values for both regions.
  • variety column specifies the type of wine.

Looking at the available data, you might notice that some records don’t have either region_1 or region_2 values, and some are missing price values as well. Since we are aggregating price values for each region, records without region and price values will need to be filtered out.

Plan your approach

As you can see, each record contains information about wine sales in two regions. We’ll use the available data to create two separate datasets, one for each region.

Because both datasets are sourced from the same table, some records might be duplicates.

We obviously need UNION, not UNION ALL, for this task. Because it allows us to:

  • Merge two datasets into one to prepare it for aggregation
  • Automatically remove identical records

Next, we need to filter. As you can see, some records in the winemag_p1 table are missing region_1, region_2, and price values. We can use the WHERE clause to make sure the combined table doesn’t have any records with empty values.

Finally, we’ll have to use the SQL GROUP BY statement and SUM() aggregate function to find the aggregate revenue for each region.

Write the code

1. UNION two datasets

Each record in the winemag_p1 table contains information about two regions.

We need to use the SELECT statement to create two datasets, one for each region. For the first dataset, we’ll SELECT values from region_1, variety, and price. For the second dataset - region_2, variety, price.

Next, we need to merge two datasets again, but this time we’ll stack them on top of one another.

We’ll aggregate combined data to find total revenue by region. So we don’t want to count each record more than once. We need to remove duplicates. That’s why we use UNION, which will vertically combine two datasets and remove any rows where all three values are the same.

SELECT region_1 AS region, 
       variety,
       price 
FROM winemag_p1
UNION
SELECT region_2 AS region,
       variety,
       price 
FROM winemag_p1


2. Filter table

Looking at the initial table, we can see that some rows have empty region_1, region_2, or price columns. Our datasets will be based on this table, so some of the records in the combined table will be empty as well.

All required columns and the first 5 rows of the solution are shown

regionvarietyprice
Assyrtiko20
Pinot Noir294
Champagne Blend50
Brunello di MontalcinoSangiovese63
Columbia Valley (WA)Merlot23

We need WITH and AS keywords to save and reference the combined table as CTE.

Finally, we pair SELECT with the WHERE clause to filter out records where region and price values are NULL.

WITH CTE AS(
SELECT region_1 AS region, 
       variety,
       price 
FROM winemag_p1
UNION
SELECT region_2 AS region,
       variety,
       price 
FROM winemag_p1
)
SELECT region,
       variety,
       price 
FROM CTE
WHERE region IS NOT NULL AND price IS NOT NULL

All required columns and the first 5 rows of the solution are shown

regionvarietyprice
Brunello di MontalcinoSangiovese63
Columbia Valley (WA)Merlot23
MendozaMalbec-Syrah12
Mazis-ChambertinPinot Noir294
SonomaZinfandel26


3. Aggregate sales

We want to aggregate revenue from each type of wine in each region. The logical choice is to find an aggregate of price values.

WITH CTE AS(
SELECT region_1 AS region, 
       variety,
       price 
FROM winemag_p1
UNION
SELECT region_2 AS region,
       variety,
       price 
FROM winemag_p1
)
SELECT region,
       variety, 
       SUM(price) AS price_sum 
FROM CTE
WHERE region IS NOT NULL AND price IS NOT NULL
GROUP BY region, variety
ORDER BY price_sum DESC;

You have reached your daily limit for code executions on our blog.
Please login/register to execute more code.


We use the AS keyword to give the aggregated column a descriptive name, ‘sum’.

Finally, we’ll need a GROUP BY statement to create groups of rows so that SUM() is applied to each region, not the entire table.

Lastly, the question also tells us to order regions by highest total sales to lowest.

Output

Final output should display the total sales for each variety of wine from each region.

All required columns and the first 5 rows of the solution are shown

regionvarietysum
NapaCabernet Sauvignon1630
Napa ValleyCabernet Sauvignon925
Diamond Mountain DistrictCabernet Sauvignon350
Mazis-ChambertinPinot Noir294
Napa ValleyBordeaux-style Red Blend267
NapaBordeaux-style Red Blend267
BarbarescoNebbiolo242
OakvilleCabernet Sauvignon200
Rao Negro ValleyMalbec177
Central CoastPinot Noir168
Brunello di MontalcinoSangiovese113
Washington OtherCabernet Sauvignon100
NapaRed Blend100
RutherfordRed Blend100
WashingtonCabernet Sauvignon100
Russian River ValleyPinot Noir97
SonomaPinot Noir97
Paso RoblesCabernet Sauvignon91
Central CoastCabernet Sauvignon91
RutherfordCabernet Sauvignon85
Alexander ValleyCabernet Sauvignon80
SonomaCabernet Sauvignon80
MeursaultChardonnay62
RiojaTempranillo60
Forla‚Sauvignon Blanc55
Anderson ValleyPinot Noir52
Mendocino/Lake CountiesPinot Noir52
ChampagneChampagne Blend50
Sta. Rita HillsPinot Noir48
MendozaMalbec45
ChalonePinot Noir44
ChablisChardonnay42
Long IslandMerlot40
Sonoma CountySauvignon Blanc40
Sonoma CoastPinot Noir40
SonomaSauvignon Blanc40
FranciacortaChardonnay39
Columbia ValleySyrah33
Walla Walla Valley (WA)Syrah33
VirginiaMerlot32
Columbia ValleyCabernet Sauvignon30
Edna ValleyPinot Noir30
Walla Walla Valley (WA)Cabernet Sauvignon30
FranciacortaSparkling Blend29
San Luis Obispo CountyPinot Noir29
NapaMerlot27
RasteauRhone-style Red Blend27
Napa ValleyMerlot27
Shenandoah Valley (CA)Sangiovese26
Sierra FoothillsSangiovese26
Russian River ValleyZinfandel26
SonomaZinfandel26
NapaMoscato25
Napa ValleyMoscato25
Pouilly-FuisseChardonnay25
Columbia ValleyMerlot23
Columbia Valley (WA)Merlot23
MontsantRed Blend22
Ribera del DueroRed Blend22
The Hamptons, Long IslandMerlot22
Valpolicella ClassicoCorvina, Rondinella, Molinara22
Willamette ValleyRiesling21
Walla Walla Valley (WA)Semillon20
Columbia ValleySemillon20
Alexander ValleyMerlot19
SonomaMerlot19
Prosecco di ValdobbiadeneProsecco19
Soave ClassicoGarganega18
Luja?n de CuyoMalbec18
North Fork of Long IslandMerlot18
Conegliano ValdobbiadeneGlera18
CarinenaRed Blend18
California OtherPinot Noir17
Vin MousseuxWhite Blend17
CaliforniaPinot Noir17
Chianti ClassicoSangiovese15
San JuanMalbec12
MendozaMalbec-Syrah12
SiciliaNero d'Avola12
Uco ValleyViognier12
South Eastern AustraliaChardonnay11
MendozaCabernet Sauvignon10
Columbia ValleyGewarztraminer9
Columbia Valley (WA)Gewarztraminer9
Pays d'OcMerlot8
California OtherSauvignon Blanc8
CaliforniaSauvignon Blanc8


Output when we use UNION ALL instead of UNION

Let’s use UNION ALL instead of UNION to understand the difference between the two.

WITH CTE AS(
SELECT region_1 AS region, 
       variety,
       price 
FROM winemag_p1
UNION ALL
SELECT region_2 AS region,
       variety,
       price 
FROM winemag_p1
)
SELECT region,
       variety, 
       SUM(price) AS price_sum 
FROM CTE
WHERE region IS NOT NULL AND price IS NOT NULL
GROUP BY region, variety
ORDER BY price_sum DESC;

All required columns and the first 5 rows of the solution are shown

regionvarietyprice_sum
NapaCabernet Sauvignon1630
Napa ValleyCabernet Sauvignon925
Diamond Mountain DistrictCabernet Sauvignon350
Mazis-ChambertinPinot Noir294
Napa ValleyBordeaux-style Red Blend267


UNION ALL does not remove duplicates, so it produces a lot more records, which might result in inaccurate aggregate values.

UNION ALL question: HR department employee

In this Amazon interview question, candidates are explicitly asked to return duplicate results. It’s an interesting use case for UNION ALL.

MediumID 9858

Generate a list of employees who work in the HR department, including only their first names and department in the output. Each employee should appear twice in the list, meaning their first name and department should be duplicated in the output.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/9858-find-employees-in-the-hr-department-and-output-the-result-with-one-duplicate

Understand the question

We are provided with information about workers. The question asks us to find workers who belong to the HR department and output their department and first name.

There is one unusual condition - we need to return two copies of the final answer.

Analyze data

Available data for this question is stored in the worker table. Let’s take a closer look at values in each column:

Table: worker
Loading Dataset

  • Integer values in the worker_id column identify each employee.
  • first_name and last_name columns contain employees’ first and last names of the varchar (text) type.
  • Integer values in the salary column represent workers’ salaries in dollars.
  • joining_date datetime values represent the date when the worker first joined
  • Text values in the department column refer to each employee’s department. Since the question asks us to find employees in the HR department, we’ll most likely have to filter records by their department value.

It looks like the department column has three possible values: ‘HR’, ‘Admin’, and ‘Account’.

Plan your approach

We need to filter employees by their department and only keep rows where the value is ‘HR’.

The question also specifies that the final output should only include first_name and department values.

Finally, the answer should consist of two copies of HR employee records. There are many ways to do this, but we’ll explore the simplest approach using UNION ALL.

Write the code

1. Find employees of the HR department

First, we should SELECT records from the workers table. Question instructions clearly tell us to output only two columns - first_name and department.

We need the WHERE clause to filter employees by their department. In this case, we need only to return employees from the ‘HR’ department.

SELECT first_name, department 
FROM worker
WHERE department = 'HR'

2. Return two copies of the dataset

The final answer should have two copies of all HR employees.

UNION ALL vertically combines the results of two queries and does not remove duplicate rows, so it’s a perfect SQL feature to answer this question.

We already have a query that returns all HR employees. We simply need to combine it with itself.

SELECT first_name, department 
FROM worker
WHERE department = 'HR'
UNION ALL
SELECT first_name, department 
FROM worker
WHERE department = 'HR'

You have reached your daily limit for code executions on our blog.
Please login/register to execute more code.

Output

Final answer should return two copies of HR employees’ first names and their department.

All required columns and the first 5 rows of the solution are shown

first_namedepartment
MonikaHR
VishalHR
MoeHR
JaiHR
JuraHR
MonikaHR
VishalHR
MoeHR
JaiHR
JuraHR


Output with UNION instead of UNION ALL

Let’s illustrate the difference between UNION vs UNION ALL by using the former instead of the latter.

SELECT first_name, department 
FROM worker
WHERE department = 'HR'
UNION
SELECT first_name, department 
FROM worker
WHERE department = 'HR';

All required columns and the first 5 rows of the solution are shown

first_namedepartment
JuraHR
MoeHR
JaiHR
VishalHR
MonikaHR

As you can see, UNION removed all duplicate rows. The question asks us to output duplicate rows, so we need to use UNION ALL.

SQL UNION vs UNION ALL: Performance

SQL UNION vs UNION ALL: Performance

UNION vertically combines two data sets, finds duplicate rows, and removes them. UNION ALL also combines datasets, but it doesn’t remove any records, regardless of their uniqueness.

Because of the extra steps involved, UNION is considered to be the slower of the two, especially when applied to large datasets.

This is a simple explanation that makes sense, but it doesn’t always hold up in the real world. There may be other factors involved, like internet speed, which might make UNION ALL slower. But the best is to assess UNION vs UNION ALL choice on a case-by-case basis.

The general rule is if you are confident that the dataset has no duplicates or that keeping duplicates is necessary for the task, go with UNION ALL.

If you need to ensure there are no duplicate records, go with UNION.

SQL UNION vs UNION ALL: Which is better?

Depends on the task at hand and what information you have about each dataset.

UNION ALL is more efficient, but it is only applicable if:

  1. You know there won’t be any duplicate values to remove
  2. Keeping duplicates is necessary for the task

If you want to combine two datasets and ensure there are no duplicates in the final table, use UNION instead.

Summary

In this “UNION vs UNION ALL” guide, we described UNION and UNION ALL, their similarities, differences, and tips for using two set operators.

Having a thorough knowledge of UNION and UNION ALL can increase your chances of finding a job. For example, understand how using one set operator instead of the other can change the query's output.

The StrataScratch platform has hundreds of questions where you can practice writing queries with UNION, UNION ALL, and many other SQL features. You can explore answers from the community and the thought process behind each solution. For example, understand the decision behind using UNION instead of UNION ALL to solve a question.

Share