What Is SQL GROUP BY and How to Use It?

What Is SQL GROUP BY and How to Use It


The article discusses the seven most typical uses of SQL GROUP BY. We’ll go through its syntax and coding examples to give a taste of real-life SQL coding.

The GROUP BY clause is one of the most ubiquitous clauses in SQL. It’s an interesting one, as its use is a basic one but also extends to advanced SQL knowledge. It seems like you have to learn it very early, and you just can’t get rid of it, no matter how complex your SQL queries become. On the contrary – the more complex they are, the more you’ll use GROUP BY.

This article will teach you what SQL GROUP BY is and show you its syntax. But the main part will be showing you how GROUP BY works. We’ll use the interview questions from our platform to show you as many common and relatable examples of GROUP BY in an SQL query.

These examples include SQL GROUP BY with:

  1. Aggregate functions
  2. JOINs
  3. Multiple Columns
  4. WHERE
  5. HAVING
  6. ORDER BY
  7. Window functions

Once you come to the end of the article, we’re sure you’ll find even more day-to-day situations where you’ll want to use SQL GROUP BY.

What Is GROUP BY Clause in SQL?

The SQL GROUP BY definition is simple: It’s an SQL clause that groups all the rows with the same column value.

Its main purpose is to help with data aggregation, working on that together with the SQL aggregate functions. You know them – COUNT(), SUM(), AVG(), MIN(), MAX(), to mention those most commonly used.

SQL GROUP BY Syntax

When data is grouped in SQL, it is done so by one or several columns. The GROUP BY syntax reflects that, as you need to specify which column(s) you want to group by.

The SQL GROUP BY syntax is, therefore, rather straightforward. However, you have to keep in mind where GROUP BY comes in relation to other SQL clauses. That’s why we’ll use them, too, in the GROUP BY syntax to show where GROUP BY stands.

SELECT column_1,
       column_2,
FROM table
WHEREGROUP BY column_1, column_2, …
HAVINGORDER BY column_1, column_2, …;

The above syntax shows that GROUP BY always comes after FROM and WHERE but before HAVING and ORDER BY.

The best way to internalize these rules is to write SQL code that requires GROUP BY, and that’s what we’ll do in the following section.

Practical Examples for Mastering SQL GROUP BY

Using SQL GROUP BY With Aggregate Functions

Using SQL GROUP BY With Aggregate Functions

When using it with aggregate functions, GROUP BY categorizes aggregated data into groups and gives it labels. Labels are distinct values from the column you group by.

The aggregate functions mustn’t be used in GROUP BY because they are executed after GROUP BY.

Question: Total AdWords Earnings

Take a look at this question by Google.


Table: google_adwords_earnings

Link to the question: https://platform.stratascratch.com/coding/10164-total-adwords-earnings


It gives you the google_adwords_earnings table to work with.

Missing data

Code

To solve this problem, you need to sum the adwords earnings and then group them by the business type. That will give you the total earnings by each business type.

It translates to this in SQL.

SELECT business_type, 
       SUM(adwords_earnings) AS earnings
FROM google_adwords_earnings
GROUP BY business_type;

There are two columns in SELECT: business_type and earnings. This second column is calculated using the SUM() aggregate function with the column adwords_earnings.

The FROM clause refers to the only table we have.Then GROUP BY: it groups data by the business_type column, which is what the question asks.

Combined with SUM(), this will output the earnings by each business type.

An error loading the results has occurred

Lookit!: Don’t Use the Aggregate Function in GROUP BY

It’s like kids trying to stick a screwdriver into a socket – we don’t know why you would do that, but you might be tempted.

We’ll rid you of the temptation: the aggregate functions are not allowed in GROUP BY. Why? Will you be electrocuted if you put aggregate functions there?

No, but you’ll get an error. Run this code and see for yourself.

SELECT business_type, 
       SUM(adwords_earnings) AS earnings
FROM google_adwords_earnings
GROUP BY business_type, SUM(adwords_earnings);

See!? And you wouldn’t believe us when we told you!

The aggregate functions are not allowed in GROUP BY

Using SQL GROUP BY With JOINs

When using JOINs in your query, it usually means you’ll have more than one table in the FROM clause. This doesn’t change anything regarding the use of GROUP BY: you simply state the columns by which you want to group your output.

The only change is that joining tables allows you to group by any columns from any table.

Question: Popularity of Hack

Here’s a good example by Meta.


Tables: facebook_employees, facebook_hack_survey

Link to the question: https://platform.stratascratch.com/coding/10061-popularity-of-hack

You have two tables at your disposal. The first one is facebook_employees.

Missing data

The second table is facebook_hack_survey.

Missing data

Code

What you need to do here is to find the average popularity and group by office location. Compared to the previous code’s logic, the only addition is that there will be two tables.

Have a look.

SELECT e.location,
       AVG(s.popularity) AS avg_popularity
FROM facebook_employees e
JOIN facebook_hack_survey s ON e.id = s.employee_id
GROUP BY e.location;

We select the location and calculate the hack popularity by applying the AVG() function to the popularity column.

Since these two columns are from one table each, we have to join the tables.  We join the tables on the columns id and employee_id.

Finally, we group the output by the column that is required to solve the problem. In this case, it’s the column location from the table facebook_employees.

An error loading the results has occurred

Lookit!: Careful When Choosing the JOIN

This isn’t in direct relation to GROUP BY in SQL. But you need to be careful when choosing which join type to use, as they may return different results.

For instance, we used (INNER) JOIN for the previous question. The result would have been the same if we had used LEFT JOIN.

However, if we had chosen to use RIGHT JOIN, we wouldn’t have gotten the right (pun intended) result.

Run this code to see what it’ll return.

SELECT e.location,
       AVG(s.popularity) AS avg_popularity
FROM facebook_employees e
RIGHT JOIN facebook_hack_survey s ON e.id = s.employee_id
GROUP BY e.location;

An error loading the results has occurred

There’s one additional row in the output with the calculated average popularity but without the location. These are all the employees appearing in the table facebook_hack_survey, but missing from facebook_employees.

To be sure which JOIN to choose, read our article about types of SQL JOINs.

Using SQL GROUP BY With Multiple Columns

Using SQL GROUP BY With Multiple Columns

Yes, of course, you can group your output by more than one column. You simply list all the required columns in GROUP BY and separate them by a comma.

Question: Number Of Custom Email Labels

Let’s practice this in the Google interview question.


Tables: google_gmail_emails, google_gmail_labels

Link to the question: https://platform.stratascratch.com/coding/10120-number-of-custom-email-labels?code_type=1

It gives us two tables: google_gmail_emails,

Missing data

and google_gmail_labels.

Missing data

Code

This question will require you to group by email labels and the user receiving an email. Here’s how to do this.

SELECT to_user AS user_id,
       label,
       COUNT(*) AS n_occurences
FROM google_gmail_emails e 
INNER JOIN google_gmail_labels l
ON e.id = l.email_id
AND l.label ILIKE 'custom%'
GROUP BY to_user, label;

The question asks us to output the receiver user ID, label, and the number of occurrences. For this third column, we simply use COUNT(*) to count the number of rows, which equals the number of occurrences.

Then we INNER JOIN the two available tables. We join them on the ID and email ID, and where the email label is custom.

To get the number of occurrences by user and email label, we list the columns to_user and label in the GROUP BY clause.

An error loading the results has occurred

Lookit!: Group What You Select

When using GROUP BY with the aggregate functions, it’s important to remember one simple rule, which is especially easy to forget if you need to group by several columns.

And the rule is: the columns you include in SELECT must also be included in GROUP BY. (Except, of course, the columns with the aggregate functions. We learned they don’t go in GROUP BY.)

Run the modified code above, where we included only one column in GROUP BY.

SELECT to_user AS user_id,
       label,
       COUNT(*) AS n_occurences
FROM google_gmail_emails e 
INNER JOIN google_gmail_labels l
ON e.id = l.email_id
AND l.label ILIKE 'custom%'
GROUP BY to_user;

Not only that the answer would be wrong, but the only output you’ll see will be an error.

Using SQL GROUP BY With ORDER BY

The use of ORDER BY doesn’t change anything in how you use GROUP BY. The only thing you need to remember is that ORDER BY comes after GROUP BY in SQL.

As ORDER BY is used for sorting your output, you can specify how you want it sorted. If you want it sorted ascendingly, write the keyword ASC after the columns in ORDER BY. If you omit the ASC keyword, the result will still be sorted ascendingly, as it is the default mode.

If you want descending sorting, use the keyword DESC.

Question: Expensive Projects

Let’s see how this works in a Microsoft interview question.


Tables: ms_projects, ms_emp_projects

Link to the question: https://platform.stratascratch.com/coding/10301-expensive-projects?code_type=1

It gives us two tables. The first one is ms_projects.

Missing data

The second table is ms_emp_projects.

Missing data

Code

To answer this question, you will need to group the result by the project title and its budget, then order it by the budget/employee ratio.

Let’s analyze the code more closely.

SELECT title AS project,
       ROUND((budget/COUNT(emp_id)::FLOAT)::NUMERIC, 0) AS budget_emp_ratio
FROM ms_projects a
INNER JOIN ms_emp_projects b 
ON a.id = b.project_id
GROUP BY title, budget
ORDER BY budget_emp_ratio DESC;

First, we select the project title. Then we calculate the amount of the project’s budget per employee working on that project. We do that by dividing the budget by the number of employees, which we get by using the COUNT() function.

As the required columns are from both tables, we need to join them in the FROM clause. We do that on the project ID columns.

Then we group the output by title and budget, as both columns appear in the SELECT statement.

The question wants you to show the output with the projects with the highest budget per employee on top. In other words, sort the result descendingly using ORDER BY.

An error loading the results has occurred

Lookit!: ORDER BY Must Come After GROUP BY

This one has to do with the order (pun intended) in which the SQL clauses are executed. ORDER BY is executed after the data is grouped, so if you put it before GROUP BY, the code won’t work.

Run this modified code to see that. It’s the same code as above. Only we put ORDER BY before GROUP BY.

SELECT title AS project,
       ROUND((budget/COUNT(emp_id)::FLOAT)::NUMERIC, 0) AS budget_emp_ratio
FROM ms_projects a
INNER JOIN ms_emp_projects b 
ON a.id = b.project_id
ORDER BY budget_emp_ratio
GROUP BY title, budget;

It throws an error.

Using SQL GROUP BY With WHERE Clause

The WHERE clause filters the data. It takes the logical condition state in WHERE, checks the data, and returns only the values that satisfy the condition. The conditions are set using logical operators or commands such as BETWEEN, LIKE, or IN.

If you don’t use SQL GROUP BY, you’re not bothered with the specific use of WHERE. However, using GROUP BY puts an important feature of WHERE to the front: it filters data before grouping. That’s why WHERE always comes before GROUP BY.

Again, knowing the execution order of the SQL clauses shows to be a valuable asset.

Question: Reviews of Hotel Arena

Let’s see how to use GROUP BY and WHERE in the question by Airbnb.


Table: hotel_reviews

Link to the question: https://platform.stratascratch.com/coding/10166-reviews-of-hotel-arena

There’s only one table to work with: hotel_reviews.

Missing data

Code

This question requires you to count the score occurrences for Hotel Arena only and show the hotel name and score.

SELECT hotel_name,
       reviewer_score,
       COUNT(*) AS score_occurrences
FROM hotel_reviews
WHERE hotel_name = 'Hotel Arena'
GROUP BY hotel_name, reviewer_score;

The code selects the hotel name and the review score. It then counts the number of each score occurrences using COUNT(*).

The condition in WHERE means the calculation is done only on the Hotel Arena hotels.

The output is grouped by hotel name and review score. You already learned how to group by multiple columns.

An error loading the results has occurred

Lookit!: WHERE Must Come Before GROUP BY

Let’s see what happens if we use WHERE after GROUP BY.

SELECT hotel_name,
       reviewer_score,
       COUNT(*) AS score_occurrences
FROM hotel_reviews
GROUP BY hotel_name, reviewer_score
WHERE hotel_name = 'Hotel Arena';

The query again returns an error.

Using SQL GROUP BY With HAVING Clause

Using SQL GROUP BY With HAVING Clause

The HAVING clause is the same as WHERE, in that both clauses are used for filtering data. But there’s one significant difference: HAVING filters data after grouping.

The conditions in HAVING are set the same way as in WHERE. The difference is that HAVING accepts aggregate functions. And, of course, it always comes after GROUP BY.

Question: Cities With the Most Expensive Homes

Here’s a question by Zillow that asks you to use SQL GROUP BY with HAVING.


Table: zillow_transactions

Link to the question: https://platform.stratascratch.com/coding/10315-cities-with-the-most-expensive-homes

We have to work with the table zillow_transactions.

Missing data

Code

To give a correct answer, you need to group data by city and filter those cities with average home prices higher than the national average using the HAVING clause.

Here’s how to do it.

SELECT city
FROM zillow_transactions a
GROUP BY city
HAVING AVG(a.mkt_price) >
  (SELECT AVG(mkt_price)
   FROM zillow_transactions)
ORDER BY city ASC;

The easy first step is to select the cities from the table and group by the same column.

OK, we now have the list of cities. This list should contain only cities with average house prices higher than the national average.

Filtering is done in HAVING. The average housing price by city is calculated using the AVG() function. This is then compared with the subquery that, again, uses AVG() to return the total national average price of homes.

Finally, the output is sorted by city alphabetically.

An error loading the results has occurred

Lookit!: HAVING Must Come After GROUP BY

Let’s write the same code, but put HAVING before GROUP BY.

Run the code below to see what happens.

SELECT city
FROM zillow_transactions a
HAVING AVG(a.mkt_price) >
  (SELECT AVG(mkt_price)
   FROM zillow_transactions)
GROUP BY city
ORDER BY city ASC;

An error, as you probably expected!

Using SQL GROUP BY With the Window Functions

The SQL window functions are the SQL functions that perform a calculation over a set of rows (window) that are related to the current row.

One of their most common uses is data aggregation. In that way, they are similar to aggregate functions. The significant difference is that the window functions don’t collapse the individual rows. In other words, they allow for the aggregated and non-aggregated values to be shown side by side.

The other common window functions use is ranking data.

When using the window functions with GROUP BY, it’s important to remember that the window functions can’t be included in GROUP BY, the same as the aggregate functions.

Question: Activity Rank

Let’s see how GROUP BY works alongside the window functions in this Google interview question.


Table: google_gmail_emails

Link to the question: https://platform.stratascratch.com/coding/10351-activity-rank

The question gives the table google_gmail_emails.

Missing data

Code

To solve the question, you have to use the aggregate and window functions with GROUP BY.

SELECT  from_user, 
        COUNT(*) as total_emails, 
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC) AS ranking
FROM google_gmail_emails 
GROUP BY from_user
ORDER BY 2 DESC, 1;

First, select the sender. Then use COUNT(*) to count the number of times this sender appears. This will equal the number of emails sent by the user.

We use the ROW_NUMBER() window function to rank the users by the number of sent emails.

The output is grouped by the sender. Finally, it is sorted by the number of sent emails descendingly and by the ranking ascendingly.

An error loading the results has occurred

Lookit!: Don’t Include the Window Functions in GROUP BY

You learned that the aggregate functions don’t go in GROUP BY. So, you might be tempted to think that the window functions aren’t the aggregate functions (correct!) and that they can go in GROUP BY (not correct!)

Let’s still try it.

SELECT  from_user, 
        COUNT(*) as total_emails, 
        ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC) AS ranking
FROM google_gmail_emails 
GROUP BY from_user, ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC, from_user ASC)
ORDER BY 2 DESC, 1;

Run the code to make sure it returns an error.

Why the window functions aren’t allowed in GROUP BY? The answer: they are executed after GROUP BY. So grouping by the window functions would mean grouping by something that isn’t yet calculated. In other words: impossible!

Summary

In this article, we discussed the seven most common uses of SQL GROUP BY. As you’ve seen, GROUP BY is a clause that offers various ways of grouping and aggregating data.

Each use case was supported by a practical example, which allowed you to see and write the code. Hope you took that opportunity, as coding is the best way for the new SQL knowledge to sink in.

There are even more coding challenges on our platform. You’re welcome to browse through and solve as many coding interview questions as possible. The questions are categorized into different difficulty levels. That way, you can smoothly learn SQL GROUP BY or any other SQL concept from the basic to an advanced level.

What Is SQL GROUP BY and How to Use It


Become a data expert. Subscribe to our newsletter.