How to Join 3 or More Tables in SQL

How to Join 3 or More Tables in SQL


Here we’re going to talk about using a commonly required SQL concept - how to join 3 or more tables in SQL.

The SQL experts often disagree on the SQL JOINs. One camp says the JOINs are the basic SQL concept. Others claim the JOINs are one of the advanced SQL topics. We won’t side with anyone because it won’t change the fact that SQL JOINs are one of the most required concepts at the SQL job interviews.

You have probably, as most people did, learned SQL JOINs by joining two tables. At some point, you probably wondered if it’s possible to join 3 tables in SQL or even more tables using the JOIN keyword.

The short answer is, yes, it’s possible!

The longer answer is, yes, it’s possible, and we will show you how to join 3 or more tables in SQL on a concrete job interview question.

The longest answer is, yes, it’s possible, and we will show you a practical example, but we first have to make sure that you understand how joining two tables works.

Joining Two Tables in SQL

If you know how to join two tables, you for sure know how to join 3 tables in SQL or even more than 3. Maybe you just don’t know that you know that.

For example, let’s take a look at this Airbnb interview question:


Tables: airbnb_apartments, airbnb_hosts

Answer:

SELECT
    nationality,
    SUM(n_beds) AS total_beds_available
FROM
    airbnb_hosts h
INNER JOIN
    airbnb_apartments a
ON
    h.host_id = a.host_id
GROUP BY
    nationality
ORDER BY
    total_beds_available DESC;

Joining table consists of the next parts:

  1. The first table
  2. The selected join keyword
  3. The second table
  4. The ON keyword
  5. The column from the first table used to connect it with the second table
  6. The equal sign (=)
  7. The column from the second table used to connect it with the first table

In this example, the first table is airbnb_hosts, with alias h. Then comes the join keyword, which is INNER JOIN in this case. The second table is airbnb_apartments, with alias a. After the ON keyword comes the column host_id from the table airbnb_hosts. Whenever this column is equal to the column host_id from the second table, the data from two tables will be joined.

That’s it! You follow that structure whenever you use any of the common join types:

  • (INNER) JOIN
  • LEFT (OUTER) JOIN
  • RIGHT (OUTER) JOIN
  • FULL OUTER JOIN
  • CROSS JOIN

Now, how do you apply this to joining 3 tables in SQL?

Joining 3 Tables in SQL

You’ll soon see why, once you know how to join 2 tables, joining 3 tables in SQL is easy. Here’s an interview question from LinkedIn:


Tables: linkedin_projects, linkedin_emp_projects, linkedin_employees

Answer:

SELECT title,
       budget,
       ceiling(prorated_expenses) AS prorated_employee_expense
FROM
   (SELECT title,
       budget,
       (end_date::date - start_date::date) * (sum(salary)/365) 
        AS prorated_expenses
FROM linkedin_projects a
INNER JOIN linkedin_emp_projects b ON a.id = b.project_id
INNER JOIN linkedin_employees c ON b.emp_id=c.id
GROUP BY title,
       budget,
       end_date,
       start_date) a
WHERE prorated_expenses > budget
ORDER BY title ASC;

We’re not going to explain the whole query, but only the JOIN part. In this case, INNER JOIN is showing up in the subquery, but this doesn’t change a thing in how 3 tables should be joined.

After you join 2 tables, the principle is that you repeat steps 2-7 mentioned above. That is, you only omit the first step, i.e., writing the name of the first table.

See how this looks like in the above code. The first table is linkedin_projects, with the alias a. Then comes the INNER JOIN keyword. After that, there’s a second table linkedin_emp_projects, with the alias b. These two tables are joined on the column id from the first table and the column project_id from the second table. Nice, you’ve joined 2 tables.

Now, you need to join the third table, which is linkedin_employees. To make SQL realize what you want to do, you need to state which join you want to use. In this case, it’s INNER JOIN again. Then you state the third table’s name, which is linkedin_employees, with the alias c. This table is joined using the column emp_id from the second table (linkedin_emp_projects) and the column id from the third table (linkedin_employees).

There’s nothing more to it. You’ve joined 3 tables in SQL! You can think about this in terms of forming a kind of chain of joins:

Chain to Join 3 Tables in SQL

Note that this doesn’t mean that, by adding a third table, you can only join it with the second table. No, by adding the third table into the join chain, you’re just stating your intention to join this table with any of the previously joined tables. You can just as well join the third table with the first table if that makes sense in a particular query.

This is the general principle that works in every situation. However, what causes a little bit of conflict here and complicates things a bit is what happens if you use the LEFT JOIN instead of INNER JOIN.

INNER JOIN v LEFT JOIN

inner join vs left join for joining three tables in sql

The logic of the INNER JOIN is that it returns all the matching rows from the first and the second table. If you join the third or any other number of tables, it doesn’t change a thing. The INNER JOIN will always return the matching rows from any number of tables you join.

But what if you use the LEFT JOIN? Here we want to introduce a concept of the reference table.

LEFT JOIN Reference Table

If you use the LEFT JOIN, the order in which you join the tables is extremely important. The LEFT JOIN will return all the rows from the first table, also called a reference table. Then it will look for the values in the second table. Wherever there are matching rows in the reference and the second table, you’ll get data from both tables.

However, if there are no rows in the second table that match the rows from the reference table, you’ll get the NULL values.

How does that work in practice and how do you then join 3 tables in SQL? Let’s have a look at an example question from LinkedIn:


Tables: linkedin_customers, linkedin_city, linkedin_country

Before analyzing the interview answer, we’d like to show you first how the LEFT JOIN works itself and what we mean when saying it will return all the rows from the reference table and only the matching rows from the second table.

Suppose you write a code below:

SELECT *
FROM linkedin_city city
LEFT JOIN linkedin_customers customers
      ON city.id = customers.city_id;

If you run it, it will return this output:

All required columns and the first 5 rows of the solution are shown

idcity_namecountry_ididbusiness_namecity_id
1London11 Hair Studio1
1London12Kosmetik Plus1
1London13Kosmetik Plus1
2Berlin24Natural Skin2
2Berlin25Kosmetik Plus2
2Berlin26Natural Skin2
2Berlin27Kosmetik Plus2
3Manchester18Kosmetik Plus3
3Manchester19 Hair Studio3
4New York3

Have a look at the last row. There is data in the columns id, city_name, and country_id. All these columns are from the table linkedin_city. However, there are no data in the following three columns. Why is that? It’s because the query returns the NULL values wherever there’s data from the master table, but there are no matching rows in the second table we joined. The result above tells us there are no customers located in New York.

How about adding the third table? Here it is:

SELECT *
FROM linkedin_city city
LEFT JOIN linkedin_customers customers
    ON city.id = customers.city_id
LEFT JOIN linkedin_country as country
    ON city.country_id = country.id;

The output of this code is below:

All required columns and the first 5 rows of the solution are shown

idcity_namecountry_ididbusiness_namecity_ididcountry_name
1London11 Hair Studio11UK
1London12Kosmetik Plus11UK
1London13Kosmetik Plus11UK
2Berlin24Natural Skin22Germany
2Berlin25Kosmetik Plus22Germany
2Berlin26Natural Skin22Germany
2Berlin27Kosmetik Plus22Germany
3Manchester18Kosmetik Plus31UK
3Manchester19 Hair Studio31UK
4New York33USA

Now there are two additional columns in the result. Even though there are NULL values, the columns id and country_name still have values in the last row. This shows New York has a matching row in the table linkedin_country, so data shows New York is in the USA, but there are no customers in that city and country.

Now we can get to the solution code of this interview question.

Answer:

WITH cities_customers AS (
    SELECT
          country.country_name AS country,
          city.city_name AS city,
          count(customer.id) AS total_customers
    FROM linkedin_country country
    INNER JOIN linkedin_city city
          ON city.country_id = country.id
    INNER JOIN linkedin_customers customer
          ON city.id = customer.city_id
   GROUP BY
          country.country_name,
          city.city_name
),

avg_customers AS (
    SELECT
          count(id)::float / count(DISTINCT city_id) AS avg_cus_per_city
    FROM linkedin_customers
)

    SELECT
          country,
          city,
          total_customers
    FROM cities_customers
    WHERE
          total_customers > (SELECT avg_cus_per_city FROM avg_customers);

Again, we will concentrate only on the join part, which can be found in the CTE. It joins table linkedin_country with the table linkedin_city. Then this table is joined with the table linkedin_customers. All 3 tables are joined using the INNER JOIN.

If you run only the first SELECT within the CTE, you’ll get this table:

All required columns and the first 5 rows of the solution are shown

countrycitytotal_customers
GermanyBerlin4

However, what if you replace the INNER JOINs with the LEFT JOINs? This new code looks like this:

SELECT
country.country_name AS country,
city.city_name AS city,
count(customer.id) AS total_customers
     FROM linkedin_country country
     LEFT JOIN linkedin_city city
        ON city.country_id = country.id
     LEFT JOIN linkedin_customers customer
        ON city.id = customer.city_id
     GROUP BY
        country.country_name,
        city.city_name;

How is this different? Your first table is again linkedin_country. When using LEFT JOIN to join it with the table linkedin_city, this will get you all the rows from the first table and only the matching rows from the second table. The second join is the LEFT JOIN between tables linkedin_city and linkedin_customers. The same logic applies here. This way, you will get all the rows from the table linkedin_country, only the matching rows from the table linkedin_city, and then again matching rows from the table linkedin_customers.

Here’s the result:

All required columns and the first 5 rows of the solution are shown

countrycitytotal_customers
GermanyBerlin4
UKLondon3
UKManchester2
USANew York0

You see, there’s one additional row compared to the previous output. You’ve got the row that says there is the USA in the table country, there’s New York in the table city, but there are no customers from this city. This is exactly what we got above when we got the NULL values. The COUNT() function counted the number of customers from New York and returned 0.

It is knowing what you want as output is that will make you decide whether to use INNER JOIN or the LEFT JOIN. However, when you use the LEFT JOIN, be aware that the order of the tables does matter, unlike with the INNER JOIN. You should always have in mind which table is your left table because that’s the table you’ll get all rows from.

If you use one LEFT JOIN, it’s generally advisable that the second join should also be the LEFT JOIN. Thinking of multiple joins as a chain of joins becomes even more important with this type of join.

Joining N Tables

Now that you’ve learned how to join 3 tables in SQL, maybe you want to know how to join more than 3 tables in SQL?

No need for us to explain that; you already know! Joining more than three tables works exactly the same way as joining three tables. Simply select the type of join you need, follow the steps of joining three tables, and there you have it: you can join as many tables and add them to the chain of joins. There’s no limit to how many.

Conclusion

You saw that joining 3 or more tables in SQL is not that hard, especially if you use the INNER JOIN. The prerequisite is to be very comfortable with joins in general. That means you know how to join 2 tables easily, but you also know all the theory behind joins and, most importantly, what each join will return if you use it in your query. Speaking of which, you should be much more careful when using the LEFT JOIN because how you use it and which order of tables you choose can have a significant impact on the output you get.

Make sure that you have strong foundational knowledge. To help you with that, we’ve also written about the different types of SQL JOINs that you must know. And here if you want to practice more SQL JOIN interview questions.

How to Join 3 or More Tables in SQL


Become a data expert. Subscribe to our newsletter.