Published: June 12, 2023

Categories:

Written by:
Nathan Rosidi

The article discusses the seven most typical uses of SQL GROUP BY. We’ll go through its syntax and coding examples to give a taste of real-life SQL coding.

The GROUP BY clause is one of the most ubiquitous clauses in SQL. It’s an interesting one, as its use is a basic one but also extends to advanced SQL knowledge. It seems like you have to learn it very early, and you just can’t get rid of it, no matter how complex your SQL queries become. On the contrary – the more complex they are, the more you’ll use GROUP BY.

This article will teach you what SQL GROUP BY is and show you its syntax. But the main part will be showing you how GROUP BY works. We’ll use the interview questions from our platform to show you as many common and relatable examples of GROUP BY in an SQL query.

These examples include SQL GROUP BY with:

Aggregate functions
JOINs
Multiple Columns
WHERE
HAVING
ORDER BY
Window functions

Once you come to the end of the article, we’re sure you’ll find even more day-to-day situations where you’ll want to use SQL GROUP BY.

What Is GROUP BY Clause in SQL?

The SQL GROUP BY definition is simple: It’s an SQL clause that groups all the rows with the same column value.

Its main purpose is to help with data aggregation, working on that together with the SQL aggregate functions. You know them – COUNT(), SUM(), AVG(), MIN(), MAX(), to mention those most commonly used.

SQL GROUP BY Syntax

When data is grouped in SQL, it is done so by one or several columns. The GROUP BY syntax reflects that, as you need to specify which column(s) you want to group by.

The SQL GROUP BY syntax is, therefore, rather straightforward. However, you have to keep in mind where GROUP BY comes in relation to other SQL clauses. That’s why we’ll use them, too, in the GROUP BY syntax to show where GROUP BY stands.

SELECT column_1,
       column_2,
 …
FROM table
WHERE …
GROUP BY column_1, column_2, …
HAVING …
ORDER BY column_1, column_2, …;

The above syntax shows that GROUP BY always comes after FROM and WHERE but before HAVING and ORDER BY.

The best way to internalize these rules is to write SQL code that requires GROUP BY, and that’s what we’ll do in the following section.

Practical Examples for Mastering SQL GROUP BY

Using SQL GROUP BY With Aggregate Functions

When using it with aggregate functions, GROUP BY categorizes aggregated data into groups and gives it labels. Labels are distinct values from the column you group by.

The aggregate functions mustn’t be used in GROUP BY because they are executed after GROUP BY.

Question: Total AdWords Earnings

Take a look at this question by Google.

Total AdWords Earnings

Last Updated: July 2020

Google

EasyID 10164

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find the total AdWords earnings for each business type. Output the business types along with the total earnings.

Table: google_adwords_earnings

Link to the question: https://platform.stratascratch.com/coding/10164-total-adwords-earnings

It gives you the google_adwords_earnings table to work with.

Table: google_adwords_earnings

Code

To solve this problem, you need to sum the adwords earnings and then group them by the business type. That will give you the total earnings by each business type.

It translates to this in SQL.

SELECT business_type, 
       SUM(adwords_earnings) AS earnings
FROM google_adwords_earnings
GROUP BY business_type;

There are two columns in SELECT: business_type and earnings. This second column is calculated using the SUM() aggregate function with the column adwords_earnings.

The FROM clause refers to the only table we have.Then GROUP BY: it groups data by the business_type column, which is what the question asks.

Combined with SUM(), this will output the earnings by each business type.

Lookit!: Don’t Use the Aggregate Function in GROUP BY

It’s like kids trying to stick a screwdriver into a socket – we don’t know why you would do that, but you might be tempted.

We’ll rid you of the temptation: the aggregate functions are not allowed in GROUP BY. Why? Will you be electrocuted if you put aggregate functions there?

No, but you’ll get an error. Run this code and see for yourself.

SELECT business_type, 
       SUM(adwords_earnings) AS earnings
FROM google_adwords_earnings
GROUP BY business_type, SUM(adwords_earnings);

See!? And you wouldn’t believe us when we told you!

The aggregate functions are not allowed in GROUP BY

Using SQL GROUP BY With JOINs

When using JOINs in your query, it usually means you’ll have more than one table in the FROM clause. This doesn’t change anything regarding the use of GROUP BY: you simply state the columns by which you want to group your output.

The only change is that joining tables allows you to group by any columns from any table.

Question: Popularity of Hack

Here’s a good example by Meta.

Popularity of Hack

Last Updated: March 2020

Using SQL GROUP BY With Multiple Columns

Yes, of course, you can group your output by more than one column. You simply list all the required columns in GROUP BY and separate them by a comma.

Question: Number Of Custom Email Labels

Let’s practice this in the Google interview question.

Number Of Custom Email Labels

Google

MediumID 10120

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Software Engineer

Find the number of occurrences of custom email labels for each user receiving an email. Output the receiver user id, label, and the corresponding number of occurrences.

Tables: google_gmail_emails, google_gmail_labels

Link to the question: https://platform.stratascratch.com/coding/10120-number-of-custom-email-labels?code_type=1

It gives us two tables: google_gmail_emails,

Table: google_gmail_emails

and google_gmail_labels.

Table: google_gmail_labels

Code

This question will require you to group by email labels and the user receiving an email. Here’s how to do this.

SELECT to_user AS user_id,
       label,
       COUNT(*) AS n_occurences
FROM google_gmail_emails e 
INNER JOIN google_gmail_labels l
ON e.id = l.email_id
AND l.label ILIKE 'custom%'
GROUP BY to_user, label;

The question asks us to output the receiver user ID, label, and the number of occurrences. For this third column, we simply use COUNT(*) to count the number of rows, which equals the number of occurrences.

Then we INNER JOIN the two available tables. We join them on the ID and email ID, and where the email label is custom.

To get the number of occurrences by user and email label, we list the columns to_user and label in the GROUP BY clause.

Lookit!: Group What You Select

When using GROUP BY with the aggregate functions, it’s important to remember one simple rule, which is especially easy to forget if you need to group by several columns.

And the rule is: the columns you include in SELECT must also be included in GROUP BY. (Except, of course, the columns with the aggregate functions. We learned they don’t go in GROUP BY.)

Run the modified code above, where we included only one column in GROUP BY.

SELECT to_user AS user_id,
       label,
       COUNT(*) AS n_occurences
FROM google_gmail_emails e 
INNER JOIN google_gmail_labels l
ON e.id = l.email_id
AND l.label ILIKE 'custom%'
GROUP BY to_user;

Not only that the answer would be wrong, but the only output you’ll see will be an error.

Using SQL GROUP BY With ORDER BY

The use of ORDER BY doesn’t change anything in how you use GROUP BY. The only thing you need to remember is that ORDER BY comes after GROUP BY in SQL.

As ORDER BY is used for sorting your output, you can specify how you want it sorted. If you want it sorted ascendingly, write the keyword ASC after the columns in ORDER BY. If you omit the ASC keyword, the result will still be sorted ascendingly, as it is the default mode.

If you want descending sorting, use the keyword DESC.

Question: Expensive Projects

Let’s see how this works in a Microsoft interview question.

Expensive Projects

Last Updated: November 2020

Microsoft

MediumID 10301

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Given a list of projects and employees mapped to each project, calculate by the amount of project budget allocated to each employee . The output should include the project title and the project budget rounded to the closest integer. Order your list by projects with the highest budget per employee first.

Tables: ms_projects, ms_emp_projects

Link to the question: https://platform.stratascratch.com/coding/10301-expensive-projects?code_type=1

It gives us two tables. The first one is ms_projects.

Table: ms_projects

The second table is ms_emp_projects.

Table: ms_emp_projects

Code

To answer this question, you will need to group the result by the project title and its budget, then order it by the budget/employee ratio.

Let’s analyze the code more closely.

SELECT title AS project,
       ROUND((budget/COUNT(emp_id)::FLOAT)::NUMERIC, 0) AS budget_emp_ratio
FROM ms_projects a
INNER JOIN ms_emp_projects b 
ON a.id = b.project_id
GROUP BY title, budget
ORDER BY budget_emp_ratio DESC;

First, we select the project title. Then we calculate the amount of the project’s budget per employee working on that project. We do that by dividing the budget by the number of employees, which we get by using the COUNT() function.

As the required columns are from both tables, we need to join them in the FROM clause. We do that on the project ID columns.

Then we group the output by title and budget, as both columns appear in the SELECT statement.

The question wants you to show the output with the projects with the highest budget per employee on top. In other words, sort the result descendingly using ORDER BY.

Lookit!: ORDER BY Must Come After GROUP BY

This one has to do with the order (pun intended) in which the SQL clauses are executed. ORDER BY is executed after the data is grouped, so if you put it before GROUP BY, the code won’t work.

Run this modified code to see that. It’s the same code as above. Only we put ORDER BY before GROUP BY.

SELECT title AS project,
       ROUND((budget/COUNT(emp_id)::FLOAT)::NUMERIC, 0) AS budget_emp_ratio
FROM ms_projects a
INNER JOIN ms_emp_projects b 
ON a.id = b.project_id
ORDER BY budget_emp_ratio
GROUP BY title, budget;

It throws an error.

Using SQL GROUP BY With WHERE Clause

The WHERE clause filters the data. It takes the logical condition state in WHERE, checks the data, and returns only the values that satisfy the condition. The conditions are set using logical operators or commands such as BETWEEN, LIKE, or IN.

If you don’t use SQL GROUP BY, you’re not bothered with the specific use of WHERE. However, using GROUP BY puts an important feature of WHERE to the front: it filters data before grouping. That’s why WHERE always comes before GROUP BY.

Again, knowing the execution order of the SQL clauses shows to be a valuable asset.

Question: Reviews of Hotel Arena

Let’s see how to use GROUP BY and WHERE in the question by Airbnb.

Reviews of Hotel Arena

Expedia

Airbnb

EasyID 10166

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find how many reviews exist for each review score given to 'Hotel Arena'. Output the hotel name ('Hotel Arena'), each review score, and the number of reviews for that score. Ensure the results only include 'Hotel Arena.'

Table: hotel_reviews

Link to the question: https://platform.stratascratch.com/coding/10166-reviews-of-hotel-arena

There’s only one table to work with: hotel_reviews.

Table: hotel_reviews

Code

This question requires you to count the score occurrences for Hotel Arena only and show the hotel name and score.

SELECT hotel_name,
       reviewer_score,
       COUNT(*) AS score_occurrences
FROM hotel_reviews
WHERE hotel_name = 'Hotel Arena'
GROUP BY hotel_name, reviewer_score;

The code selects the hotel name and the review score. It then counts the number of each score occurrences using COUNT(*).

The condition in WHERE means the calculation is done only on the Hotel Arena hotels.

The output is grouped by hotel name and review score. You already learned how to group by multiple columns.

Lookit!: WHERE Must Come Before GROUP BY

Let’s see what happens if we use WHERE after GROUP BY.

SELECT hotel_name,
       reviewer_score,
       COUNT(*) AS score_occurrences
FROM hotel_reviews
GROUP BY hotel_name, reviewer_score
WHERE hotel_name = 'Hotel Arena';

The query again returns an error.

Using SQL GROUP BY With HAVING Clause

The HAVING clause is the same as WHERE, in that both clauses are used for filtering data. But there’s one significant difference: HAVING filters data after grouping.

The conditions in HAVING are set the same way as in WHERE. The difference is that HAVING accepts aggregate functions. And, of course, it always comes after GROUP BY.

Question: Cities With the Most Expensive Homes

Here’s a question by Zillow that asks you to use SQL GROUP BY with HAVING.

Cities With The Most Expensive Homes

Last Updated: December 2020

Zillow

MediumID 10315

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Write a query that identifies cities with higher than average home prices when compared to the national average. Output the city names.

Table: zillow_transactions

Link to the question: https://platform.stratascratch.com/coding/10315-cities-with-the-most-expensive-homes

We have to work with the table zillow_transactions.

Table: zillow_transactions

Code

To give a correct answer, you need to group data by city and filter those cities with average home prices higher than the national average using the HAVING clause.

Here’s how to do it.

SELECT city
FROM zillow_transactions a
GROUP BY city
HAVING AVG(a.mkt_price) >
  (SELECT AVG(mkt_price)
   FROM zillow_transactions)
ORDER BY city ASC;

The easy first step is to select the cities from the table and group by the same column.

OK, we now have the list of cities. This list should contain only cities with average house prices higher than the national average.

Filtering is done in HAVING. The average housing price by city is calculated using the AVG() function. This is then compared with the subquery that, again, uses AVG() to return the total national average price of homes.

Finally, the output is sorted by city alphabetically.

Lookit!: HAVING Must Come After GROUP BY

Let’s write the same code, but put HAVING before GROUP BY.

Run the code below to see what happens.

SELECT city
FROM zillow_transactions a
HAVING AVG(a.mkt_price) >
  (SELECT AVG(mkt_price)
   FROM zillow_transactions)
GROUP BY city
ORDER BY city ASC;

An error, as you probably expected!

Using SQL GROUP BY With the Window Functions

The SQL window functions are the SQL functions that perform a calculation over a set of rows (window) that are related to the current row.

One of their most common uses is data aggregation. In that way, they are similar to aggregate functions. The significant difference is that the window functions don’t collapse the individual rows. In other words, they allow for the aggregated and non-aggregated values to be shown side by side.

The other common window functions use is ranking data.

When using the window functions with GROUP BY, it’s important to remember that the window functions can’t be included in GROUP BY, the same as the aggregate functions.

Question: Activity Rank

Let’s see how GROUP BY works alongside the window functions in this Google interview question.

Activity Rank

Last Updated: July 2021

Google

MediumID 10351

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find the email activity rank for each user. Email activity rank is defined by the total number of emails sent. The user with the highest number of emails sent will have a rank of 1, and so on. Output the user, total emails, and their activity rank.

• Order records first by the total emails in descending order. • Then, sort users with the same number of emails in alphabetical order by their username. • In your rankings, return a unique value (i.e., a unique rank) even if multiple users have the same number of emails.

Table: google_gmail_emails

Link to the question: https://platform.stratascratch.com/coding/10351-activity-rank

The question gives the table google_gmail_emails.

Table: google_gmail_emails

Code

To solve the question, you have to use the aggregate and window functions with GROUP BY.

SELECT  from_user, 
        COUNT(*) as total_emails, 
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC) AS ranking
FROM google_gmail_emails 
GROUP BY from_user
ORDER BY 2 DESC, 1;

First, select the sender. Then use COUNT(*) to count the number of times this sender appears. This will equal the number of emails sent by the user.

We use the ROW_NUMBER() window function to rank the users by the number of sent emails.

The output is grouped by the sender. Finally, it is sorted by the number of sent emails descendingly and by the ranking ascendingly.

Lookit!: Don’t Include the Window Functions in GROUP BY

You learned that the aggregate functions don’t go in GROUP BY. So, you might be tempted to think that the window functions aren’t the aggregate functions (correct!) and that they can go in GROUP BY (not correct!)

Let’s still try it.

SELECT  from_user, 
        COUNT(*) as total_emails, 
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC) AS ranking
FROM google_gmail_emails 
GROUP BY from_user, ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC)
ORDER BY 2 DESC, 1;

Run the code to make sure it returns an error.

Why the window functions aren’t allowed in GROUP BY? The answer: they are executed after GROUP BY. So grouping by the window functions would mean grouping by something that isn’t yet calculated. In other words: impossible!

Summary

In this article, we discussed the seven most common uses of SQL GROUP BY. As you’ve seen, GROUP BY is a clause that offers various ways of grouping and aggregating data.

Each use case was supported by a practical example, which allowed you to see and write the code. Hope you took that opportunity, as coding is the best way for the new SQL knowledge to sink in.

There are even more coding challenges on our platform. You’re welcome to browse through and solve as many coding interview questions as possible. The questions are categorized into different difficulty levels. That way, you can smoothly learn SQL GROUP BY or any other SQL concept from the basic to an advanced level.

What Is SQL GROUP BY and How to Use It?

What Is GROUP BY Clause in SQL?

SQL GROUP BY Syntax

Practical Examples for Mastering SQL GROUP BY

Using SQL GROUP BY With Aggregate Functions

Question: Total AdWords Earnings

Total AdWords Earnings

Using SQL GROUP BY With JOINs

Question: Popularity of Hack

Popularity of Hack

Using SQL GROUP BY With Multiple Columns

Question: Number Of Custom Email Labels

Number Of Custom Email Labels

Using SQL GROUP BY With ORDER BY

Question: Expensive Projects

Expensive Projects

Using SQL GROUP BY With WHERE Clause

Question: Reviews of Hotel Arena

Reviews of Hotel Arena

Using SQL GROUP BY With HAVING Clause

Question: Cities With the Most Expensive Homes

Cities With The Most Expensive Homes

Using SQL GROUP BY With the Window Functions

Question: Activity Rank

Activity Rank

Summary

Latest Posts:

Unsupervised Clustering: Methods, Examples, and When to Use

A Guide to Master Machine Learning Modeling from Scratch

How to Create a Bubble Plot with Python and Matplotlib?