SQL JOIN Interview Questions
This article covers the most common SQL JOIN interview questions, how to answer them, and what you need to know about JOINs to join the desired company.
SQL JOINs are one of those really interesting SQL topics. The one that is, in theory, fairly simple, and there’s really no difficult theory behind the concept. At the same time, they are a very complex topic that requires plenty of practice and experience, and knowledge of data to really understand them and use them properly.
There’s really nothing too complex behind the idea of JOINs. There are no broad topics and interview questions that go into so much detail. The reason is, there’s no so much variety in SQL JOINs. That’s not suggesting the JOIN questions can’t be tricky. But the thing is, exactly the same several theoretical concepts that you’ll be required to know are covered in both theoretical and practical job interview questions—no mystery and no tricks here. The only trick is the following: using JOINs is relatively simple, but knowing what you want as a result and what JOIN you should use is completely another thing.
Let’s start with the simple theoretical part covering the concepts you’ll have to know about JOINs. Exactly the same concepts will be tested in SQL JOIN interview questions. So once you know what JOINs are, the only way to become a JOIN master is to practice, and through it, realize all the nuances between different JOINs usage.
The Feather-Light Idea of What SQL JOINs Are
There’s nothing else to be asked in the interview except that you know what the joins are and what every join type does. We’ll go through this quickly and give you the concepts you need to know. The concepts are the SQL JOIN interview questions at the same time.
What Are the JOINS?
JOIN is an SQL clause used to connect two or more tables and get the result that is a combination of data from all the tables. The joining is done via the common column these tables share, with this column serving as a bridge between the tables.
What JOINs Are There?
There are generally five types of (equi) JOIN statements:
- (INNER) JOIN
- LEFT (OUTER) JOIN
- RIGHT (OUTER) JOIN
- FULL (OUTER) JOIN
- CROSS JOIN
Check out our post "Different Types of SQL JOINs" where we discussed all these joins in detail.
A JOIN or INNER JOIN is the one that will return only the matching rows from both tables. Graphically explained, it looks like this:
LEFT (OUTER) JOIN
A LEFT JOIN or LEFT OUTER JOIN is a type of JOIN that will return all the rows from the left table and only the matching rows from the right table. The data that can’t be found in the right table will have the NULL values.
You can visualize it like this:
RIGHT (OUTER) JOIN
This is exactly the opposite of the LEFT JOIN. A RIGHT JOIN returns all rows from the right table and only the matching rows from the left table. Those rows from the right table that are not matched with data from the left table will again be shown as NULL values.
When you look at the graph below, you’ll realize what it means when we say it’s a mirror image of the LEFT JOIN:
FULL (OUTER) JOIN
The FULL OUTER JOIN will give you all data from the left table and all data from the right one.
Or graphically presented, it looks like this:
The CROSS JOIN results in the Cartesian product. It means this type of join will return all the combinations of data from all the tables joined.
To make it easier for you to understand what this means, here’s the visual representation of “all the data combinations” meaning:
What is a Natural Join?
It’s a subtype of an INNER JOIN. Joining tables this way means you’re joining them based on the columns that have the same name in both tables and are of the same data type. It means you simply use the NATURAL JOIN statement, without explicitly specifying the columns in the ON clause.
What are Non-Equi Joins?
All the joins above are the equi joins. That’s because they connect the tables on columns where data from one table is the same as in another table by using the equals (=) sign. The non-equi joins are just all those JOIN types, except they use some other comparison operators, such as less than (<), greater than (>), less than or equal to (<=), or greater than or equal to (>=).
There are three main uses of non-equi joins:
- finding duplicates in tables
- joining tables on a range of values
- calculating running totals
The Heavyweight Ability to Use SQL JOINs
Generally, there’s nothing more to be asked about joins. So you might think that’s easier than you thought. But the going gets heavier once you are used to using join. That’s when you realize it’s not enough to know what joins are and what they do.
You need to have plenty of joined tables under your belt, along with getting completely different results from your query compared to the one you wanted. Then you’ll start to really understand what the JOINs are, how they can help you your SQL life, or make it harder if you’re not sure what you’re doing.
Exactly that’s what the interview will try to unveil: whether you really understand JOINs and have experience with them or you simply learned by heart what they are and what they should do.
Make sure that you either have this working experience or that you solved plenty of SQL interview questions that might not be explicitly focused on JOINs but are simply there, and you have to use them. For sure, you’ll have to use them whenever there is a question that involves more than one table to work with.
SQL JOIN Interview Questions
SQL JOIN Question
For starters, you could get a question like this one they ask at Facebook:
“Write a query to calculate the distribution of comments by the count of users that joined Facebook between 2018 and 2020, for the month of January 2020.
The output should contain a count of comments and the corresponding number of users that made that number of comments in Jan-2020. For example, you'll be counting how many users made 1 comment, 2 comments, 3 comments, 4 comments, etc in Jan-2020. Your left column in the output will be the number of comments while your right column in the output will be the number of users. Sort the output from the least number of comments to highest.
To add some complexity, there might be a bug where a user post is dated before the user join date. You'll want to remove these posts from the result.”
Link to the question: https://platform.stratascratch.com/coding/10297-comments-distribution
First, let’s try how JOIN works on a raw data. We’ll be joining tables on the columns id (table fb_users) and user_id (table fb_comments):
This is just a sample of data from the tables. In the table fb_users there are 24 rows (users). In the table fb_comments, there are 100 rows (users).
So if you simply JOIN these two tables you’ll get this set of data:
It’s the combination of data from one table with all the matching data from another table. The result will again have 100 rows. That’s because there are no comments by some users not shown in the table fb_users. Also, there are no users in the table fb_users that don’t have at least one comment in the table fb_comments. It means this join type will join all the comments with the corresponding users.
Knowing that, you can use it in your solution in the following way.
SELECT comment_cnt, count(t.id) AS user_cnt FROM (SELECT a.id, count(*) AS comment_cnt FROM fb_users a INNER JOIN fb_comments b ON a.id=b.user_id WHERE b.created_at BETWEEN '01-01-2020' AND '01-31-2020'::date AND a.joined_at BETWEEN '01-01-2018' AND '12-31-2020'::date AND created_at - joined_at >=0 GROUP BY a.id) t GROUP BY comment_cnt ORDER BY comment_cnt ASC
SQL LEFT JOIN Question
If the interviewer wants to see whether you understand the LEFT JOIN, you could get a question along these lines. This one’s from Amazon:
“Find the customer with the highest total order cost between 2019-02-01 to 2019-05-01. If customer had more than one order on a certain day, sum the order costs on daily basis. Output their first name, total cost of their items, and the date.
For simplicity, you can assume that every first name in the dataset is unique.”
Link to the question: https://platform.stratascratch.com/coding/9915-highest-cost-orders
Let’s use the LEFT JOIN on tables in question to see what happens. The joining will be on the columns id (table orders) and cust_id (table customers).
If we’ve used the LEFT JOIN on these tables, we would get all the orders from the table orders and only customers where there’s a customer with the same id in both cust_id from the first table and id from the second table. That means if there’s a customer that placed an order, but we didn’t yet enter it in the table customer, we will get the order, but all the customer data will be NULL. Here’s what the LEFT JOIN will return:
SELECT first_name, sum(total_order_cost) AS total_order_cost, order_date FROM orders o LEFT JOIN customers c ON o.cust_id = c.id WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1' GROUP BY first_name, order_date HAVING sum(total_order_cost) = (SELECT max(total_order_cost) FROM (SELECT sum(total_order_cost) AS total_order_cost FROM orders WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1' GROUP BY cust_id, order_date) b)
Since the RIGHT JOIN is the same as the LEFT JOIN looked from the other side and can be solved using the LEFT JOIN, you probably won’t find plenty of RIGHT JOIN questions. But if you want to practice it, simply try to solve the question where you need LEFT JOIN by writing it using the RIGHT JOIN and adapting the code accordingly.
SQL RIGHT JOIN Question
How about the RIGHT JOIN? They’re less common because whenever you have RIGHT JOIN you can reverse the order of the tables, put LEFT JOIN instead of RIGHT JOIN and that’s it. However, we want you to show how the RIGHT JOIN works, nevertheless. That’s why we used Amazon’s SQL LEFT JOIN interview question and adapted its solution to a RIGHT JOIN approach.
“Find the number of orders, the number of customers, and the total cost of orders for each city. Only include cities that have made at least 5 orders and count all customers in each city even if they did not place an order.
Output each calculation along with the corresponding city name.”
Link to the question: https://platform.stratascratch.com/coding/9908-customer-orders-and-details
We’ll use table customers and table orders. These two tables will be joined on the columns cust_id (table orders) and id (table customers). It’s the same two tables as in the previous example.
Now, what happens if we join these tables using the RIGHT JOIN? Again, exactly the same result as in the previous example:
Let’s see what is the official answer to this SQL JOIN interview question. Then we’ll adapt the code and use the RIGHT JOIN.
SELECT customers.city, count(DISTINCT orders.id) AS orders_per_city, count(DISTINCT customers.id) AS customers_per_city, sum(orders.total_order_cost) AS orders_cost_per_city FROM customers LEFT JOIN orders ON customers.id = orders.cust_id GROUP BY customers.city HAVING count(orders.id) >=5
SELECT customers.city, count(DISTINCT orders.id) AS orders_per_city, count(DISTINCT customers.id) AS customers_per_city, sum(orders.total_order_cost) AS orders_cost_per_city FROM orders RIGHT JOIN customers ON orders.cust_id = customers.id GROUP BY customers.city HAVING count(orders.id) >=5
You see, the only difference is there is the RIGHT JOIN instead of LEFT JOIN. And, of course, the order of tables in the JOIN clause is different. What was once a left table, now is a right table and vice versa.
SQL FULL (OUTER) JOIN Question
There’s an Amazon question that requires the FULL JOIN:
“Find employees from Arizona, California, and Hawaii while making sure to output all employees from each city. Output column headers should be Arizona, California, and Hawaii. Data for all cities must be ordered on the first name.
Assume unequal number of employees per city.”
Link to the question: https://platform.stratascratch.com/coding/9918-arizona-california-and-hawaii-employees
In its solution, it uses FULL OUTER JOIN to join three CTES.
The first CTE is Arizona:
The second is California:
The third CTE is Hawaii:
The result of FULL OUTER JOIN of these three tables is:
It’s almost exactly what returns the official answer to this SQL JOIN interview question, bare some aesthetic interventions.
WITH arizona AS (SELECT first_name AS aname, ROW_NUMBER() OVER ( ORDER BY first_name) AS arn FROM employee WHERE city='Arizona' ), california AS (SELECT first_name AS cname, ROW_NUMBER() OVER ( ORDER BY first_name) AS crn FROM employee WHERE city='California' ), hawaii AS (SELECT first_name AS hname, ROW_NUMBER() OVER ( ORDER BY first_name) AS hrn FROM employee WHERE city='Hawaii' ) SELECT aname AS arizona, cname AS california, hname AS hawaii FROM hawaii FULL OUTER JOIN california ON hrn=crn FULL OUTER JOIN arizona ON COALESCE(hrn, crn) =arn;
SQL CROSS JOIN Question
The CROSS JOINS are also not that often, but they can occur sometimes. For example, in this interview question by Deloitte:
“In a marathon, gun time is counted from the moment of the formal start of the race while net time is counted from the moment a runner crosses a starting line. Both variables are in seconds.
You are asked to check if the interval between the two times is different for male and female runners. First, calculate the average absolute difference between the gun time and net time. Group the results by available genders (male and female). Output the absolute difference between those two values.”
Link to the question: https://platform.stratascratch.com/coding/2064-difference-between-times
Before going to the answer, let’s see what happens when we CROSS JOIN tables. This SQL JOIN interview question actually wants you to CROSS JOIN CTEs, not the raw tables.
The first CTE in the solution is male_avg_diff:
The second CTE is female_avg_diff, which returns this data:
So if you CROSS JOIN these two CTEs, you’ll get this result:
With this, you’ve almost reached the code in the answer.
WITH male_avg_diff AS (SELECT AVG(ABS(gun_time - net_time)) AS abs_diff FROM marathon_male), female_avg_diff AS (SELECT AVG(ABS(gun_time - net_time)) AS abs_diff FROM marathon_female) SELECT ABS(m.abs_diff - f.abs_diff) diiference FROM male_avg_diff m CROSS JOIN female_avg_diff f
SQL Non-equi JOIN Question
The non-equi joins are not so often as equi joins, but you for sure could be asked to write a code that involves them. For example, when answering this question by Wine Magazine:
“Find all provinces which produced more wines in 'winemag_p1' than they did in 'winemag_p2'.
Output the province and the corresponding wine count.
Order records by the wine count in descending order.”
Link to the question: https://platform.stratascratch.com/coding/10038-find-all-provinces-which-produced-more-wines-in-winemag_p1-than-they-did-in-winemag_p2
SELECT tmp1.province, tmp1.cnt_1 FROM (SELECT province, count(*) AS cnt_1 FROM winemag_p1 WHERE province IS NOT NULL GROUP BY province) tmp1 INNER JOIN (SELECT province, count(*) AS cnt_2 FROM winemag_p2 WHERE province IS NOT NULL GROUP BY province) tmp2 ON tmp1.province = tmp2.province AND tmp1.cnt_1 > tmp2.cnt_2 ORDER BY tmp1.cnt_1 DESC
JOINs are only one of the common concepts asked at the SQL job interviews. We’ve already written about other concepts in our blog, so feel free to use this to understand the concepts asked at the interviews.
Joins as a concept in SQL are pretty straightforward. They shouldn’t be hard to understand. Even writing a code and using them is not difficult. Once you are familiar with the fact the join keyword is followed by ON, and then the common column is stated, you know how to write any join. You simply change the join type keyword, and that’s it. You just need to know what join to use. And exactly that is what makes joins such a difficult topic sometimes. All the complexity lies in the decision of what join to use.
To get better at such decisions, you’ll need to gain as much experience as possible. After that, it’ll be much easier for you to read a problem or a SQL JOIN interview question, understand data and decide correctly on which join type you should use.