Mastering the SQL Subqueries With Multiple Tables

Mastering the Most Common Subqueries in SQL


Mastering the tools to use subqueries involving multiple tables

The complexity of business processes and vast amounts of data in organizations often mean that a single table does not contain all the information needed to make informed decisions or answer critical questions. As a result, multiple tables are often required to gain a more complete understanding of the context and to enrich data analysis. So as a data analyst, it is essential to master SQL joins and subqueries.

Subqueries allow us to use the results of one query as the input for another query and are a powerful tool for filtering and manipulating data. But subqueries can be tricky to master, especially when dealing with multiple tables.

That's why we're here to help. In today’s tutorial, we'll teach you how to use IN, NOT IN, EXISTS, and NOT EXISTS, which are common subqueries that allow you to filter data based on specific criteria.

To help you understand everything we cover here, here’s also a Youtube video from our channel.

IN vs NOT IN

‘IN’ and ‘NOT IN’ are used to filter data based on specific values.

The WHERE clause with an equals operator is commonly used to filter data. However, when filtering by a longer list of values, using the OR operator can be tedious. In such cases, we can use the IN operator to filter data. The IN operator allows us to specify a list of values to match against a given column. It is a more concise and readable way of filtering data compared to using the OR operator repeatedly.

Let’s take an example question from Amazon:


Table: worker

Link to the question: https://platform.stratascratch.com/coding/9838-find-details-of-workers-excluding-those-with-the-first-name-vipul-or-satish

Table: worker
worker_idfirst_namelast_namesalaryjoining_datedepartment
1MonikaArora1000002014-02-20HR
2NiharikaVerma800002014-06-11Admin
3VishalSinghal3000002014-02-20HR
4AmitahSingh5000002014-02-20Admin
5VivekBhati5000002014-06-11Admin

For example, using this worker dataset from Amazon, we can find the workers of the departments ‘Admin’ and ‘HR’ by:

SELECT worker_id,
       first_name,
       last_name
FROM worker
WHERE department='Admin'
  OR department='HR'

But if given a larger dataset with more values to filter on, it is a lot more readable to use the IN operator.

SELECT worker_id, first_name, last_name FROM worker WHERE department IN ('Admin', 'HR')

Sounds simple enough? Try to solve this question. Aside from IN, you’ll also have to use the NOT IN operator!

Using IN/NOT IN with subqueries

IN and NOT IN are not only useful in filtering based on a specified list of values. By using IN and NOT IN, we can also filter the main table based on a set of values from another table or subquery without having to join the tables together!

Let’s try solving this problem.


Tables: winemag_p1, winemag_p2

Link to the question: https://platform.stratascratch.com/coding/10147-find-countries-that-are-in-winemag_p1-dataset-but-not-in-winemag_p2

Table: winemag_p1
idcountrydescriptiondesignationpointspriceprovinceregion_1region_2varietywinery
126576USRich and round, this offers plenty of concentrated blackberry notes enveloped in warm spices and supple oak. There's a hint of green tomato leaves throughout, but the lush fruit combined with sturdy grape tannins and high acidity would pair well with fatty short ribs or braised pork.Estate Club8732VirginiaVirginiaMerlotVeramar
127077ItalyThis luminous sparkler offers measured aromas of yellow fruit and honey and delivers a broad, wide approach in the mouth. The wine's texture is creamy and full and there is a spicy point of effervescence on the palate.Extra Dry8519VenetoProsecco di ValdobbiadeneProseccoVaraschin
143029USThere are some pretty features to this wine, namely the lovely tannins and rich fruit, but it's just too soft and sweet. Tastes like blackberry jam, with extra white sugar spinkled on top.Signature8345CaliforniaPaso RoblesCentral CoastCabernet SauvignonByington
98813USLovely nose, which runs from violets and cherries right into coffee and sweet chocolate. In the mouth the wine coats the palate with Napa-like intensity, only buttressed with more vivid acidity. It's both smooth and deep. The aromas are more detailed, interesting and really clean, loaded with dark flavors of soy, coffee, molasses and black cherry liqueur.Champoux & Loess Vineyards Reserve93100WashingtonWashingtonWashington OtherCabernet SauvignonBoudreaux Cellars
43172USTastes sweeter and fruitier than the 2009, which was dry, so if you like orange marmalade, pineapple tart and vanilla cookie sugariness in your Sauvignon Blancs, it's for you.Les Pionniers8640CaliforniaSonoma CountySonomaSauvignon BlancCapture

Table: winemag_p2
idcountrydescriptiondesignationpointspriceprovinceregion_1region_2taster_nametaster_twitter_handletitlevarietywinery
118040USA bit too soft and thus lacks structure. Shows a good array of wild berry and briary, brambly flavors. Dry and spicy, and ready now.The Convict Rocky Ridge Vineyard8638CaliforniaRockpileSonomaParadise Ridge 2006 The Convict Rocky Ridge Vineyard Zinfandel (Rockpile)ZinfandelParadise Ridge
59743ItalyAromas of toasted oak, vanilla and a confectionary note lead the nose while the palate offers mature Golden Delicious apple, butterscotch and walnut skin alongside bracing acidity.Mongris Riserva8830Northeastern ItalyCollioKerin O'Keefe@kerinokeefeMarco Felluga 2012 Mongris Riserva Pinot Grigio (Collio)Pinot GrigioMarco Felluga
117951USHeavy in alcohol and overripe, but entirely dry, and the black currant, chocolate and pepper flavors will play well against richly sauced barbecue.The Caboose8430CaliforniaAlexander ValleySonomaStarry Night 2007 The Caboose Zinfandel (Alexander Valley)ZinfandelStarry Night
10202FranceFragrant; dusty plum and rose aromas are followed on the palate by very fruity flavors of cherries and apple skins. The fine bead and rich mousse result in a creamy mouthfeel. It's all a bit obvious and simple, but undeniably good.Brut Rosa8783ChampagneChampagneJoe Czerwinski@JoeCzRuinart NV Brut Rosa (Champagne)Champagne BlendRuinart
69268GermanyLayers of plush yellow peach and apricot flavors highlight this easy, crowd-pleasing kabinett. Sunny lemon acidity lends refreshment to the midpalate, cutting through all the ripeness and leading to a brisk finish. Drink now through 2019.Kabinett8815RheinhessenAnna Lee C. IijimaWeinreich 2012 Kabinett Riesling (Rheinhessen)RieslingWeinreich

To filter out duplicate countries from the winemag_p1 dataset, we can use the NOT IN operator and a subquery that selects all the countries listed in winemag_p2. This is a more efficient alternative to using the OR operator to filter by each value. The final query becomes:

SELECT DISTINCT country
FROM winemag_p1
WHERE country NOT IN (SELECT DISTINCT country
                      FROM winemag_p2)

So, when it comes to using IN or NOT IN in SQL, it's all about filtering data based on a specific column and a list of allowed or excluded values. To make it happen, you need to follow a few simple steps.

  1. First, you'll want to select the columns you want to work with from a main table.
  2. Then, you can use a WHERE clause to filter where a specific column is either IN or NOT IN a list of values. This list of values can either be
    1. specified directly, in a tuple-like format; or
    2. indirectly, using a subquery that returns a list of eligible values for the column

Info box to explain the options:
SELECT col1, col2 FROM table WHERE col1 IN

  • (‘allowed_value1’, ‘allowed_value2’, ‘allowed_value3’)
  • (SELECT col1 FROM table2 GROUP BY col1 HAVING count(*)>2)

But here's the catch: if you decide to use a subquery, it needs to return a list of values for the specific column you're filtering on. For instance, if you're filtering based on col1, the subquery should only return col1 values.

EXISTS and NOT EXISTS

Let's discuss another SQL function that enables us to filter data based on the results from another table, which is called EXISTS and NOT EXISTS. But before we delve into some examples, let's first understand the difference between IN/NOT IN and EXISTS/NOT EXISTS, as they can be easily mixed up.

Infobox:

EXISTS and NOT EXISTS Subqueries in SQL

Let me explain it to you in simple terms - both IN/NOT IN and EXISTS/NOT EXISTS are used in the WHERE clause, but their functionalities are different. IN and NOT IN filter rows by checking if a particular value is present or not, while EXISTS and NOT EXISTS filter based on the existence of rows in a subquery. So, keep this difference in mind while working with these functions.

Let's move on and explore the EXISTS and NOT EXISTS functions with this question from Meta.


Tables: users_friends, users_pages

Link to the question: https://platform.stratascratch.com/coding/2081-recommendation-system

Table: users_friends
user_idfriend_id
12
14
15
21
23

Table: users_pages
user_idpage_id
121
125
225
223
224

Discovery is the aim of recommendation systems, so the pages recommended should be new. In other words, we should only return pages once (i.e., not duplicated), and these should be pages that the user does not yet follow.

First, we need to join the user_friends table to the user_pages table to identify the pages that their friends follow.

SELECT f.user_id,
       p.page_id
FROM users_friends f
JOIN users_pages p ON f.friend_id = p.user_id

But since friends may follow the same pages, we need to return only distinct page recommendations for each user. Easy peasy, we just add the DISTINCT keyword to the SELECT statement.

SELECT DISTINCT f.user_id,
                p.page_id
FROM users_friends f
JOIN users_pages p ON f.friend_id = p.user_id

But now comes the real magic - using the NOT EXISTS operator to filter out pages that the user is already following. We create a subquery that checks for matching user_id and page_id combinations in the user_pages table and join it to the outer query. If this subquery returns any row, the user-page combination is removed. Otherwise, it is retained.

SELECT *
FROM users_pages pg
WHERE pg.user_id = f.user_id
  AND pg.page_id = p.page_id)

And just like that, we have our solution! Our final query looks like this:

SELECT DISTINCT f.user_id,
                p.page_id
FROM users_friends f
JOIN users_pages p ON f.friend_id = p.user_id
WHERE NOT EXISTS
    (SELECT *
     FROM users_pages pg
     WHERE pg.user_id = f.user_id
       AND pg.page_id = p.page_id)

Conclusion

With our tutorial today, you’ll be better equipped to make more complex SQL queries using multiple tables through IN, NOT IN, EXISTS, and NOT EXISTS. In summary, the IN operator is great for when you have a specific list of values to match against, while EXISTS is more useful for complex subquery filtering.

Check out “SQL Interview Questions” to practice more such questions.

Mastering the Most Common Subqueries in SQL


Become a data expert. Subscribe to our newsletter.