Database Interview Questions

Database Interview Questions

Categories:

  • Author Avatar
    Written by:

    Nathan Rosidi

We’ll show you the most common SQL concepts you need to know to solve database interview questions.

Today, the database is almost a synonym with SQL. Even though those are two separate things, the popularity of SQL for working with databases made it unimaginable to know databases without knowing SQL or vice versa.

In this article, we’ll solely focus on SQL coding questions. This is the single most important technical skill for anyone working with databases. Of course, it’s not the only skill, so you might want to find out the other skills needed to be a data scientist.

All other database interview questions can contain a little bit of this or a little bit of that, depending on the job description. But by covering the SQL questions, we’ll cover one technical skill that is, almost without exception, required in every data science job you take up. Be it data analyst, data engineer, or data scientist; you’ll have to know SQL.

We’re going to show you topics that represent the SQL knowledge required for every data science position.

How will we approach this? On the StrataScratch platform, the questions are also categorized in the topic family. We’ll use this to give you the most popular topics as a common ground for every data science job position.

The Seven Common Topics Tested in SQL Database Interview Questions

The topics most tested are:

  1. Aggregate functions and the DISTINCT clause
  2. WHERE clause
  3. GROUP BY clause
  4. Ranking rows and LIMIT clause
  5. Subqueries and CTEs
  6. JOINs
  7. Data organizing and pattern matching

We’ll shortly explain every concept, and then we’ll go through some SQL database interview questions that test it. These concepts are, of course, interweaving, so usually, several of those concepts are tested in one question, sometimes even all of them.

Common Topics Tested in SQL Database Interview Questions

1. Aggregate Functions and the DISTINCT Clause

The aggregate functions, as the name suggests, aggregate data. How do they do that? They perform a calculation on a data set and return one row with a single value.

Maybe you don’t know they’re called that, but you probably used some of the most popular aggregate functions:

  • SUM()
  • COUNT()
  • MIN()
  • MAX()
  • AVG()

For example, this easy Twitch database interview question requires both aggregate functions and GROUP BY knowledge:

Last Updated: February 2021

EasyID 2011

Calculate the average session duration (in seconds) for each session type.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/2011-session-type-duration

Answer:

SELECT session_type,
             avg(session_end -session_start) AS duration
FROM twitch_sessions
GROUP BY session_type;
Missing data

To solve this problem, you need to calculate the session duration using the AVG() function on a difference between session start and end, i.e., the session duration. Since you have to show data on a session type level, you’ll also have to use GROUP BY on a session_type column.

The code will return this result:

All required columns and the first 5 rows of the solution are shown

session_typeduration
viewer1908
streamer1310.818182

Aggregation of data is one of the main ways to organize and clean data, which is one of the data scientist’s jobs. A great way to practice aggregate functions is the Mode Analytics SQL tutorial on aggregate functions. Of course, you can solve some of the StrataScratch interview questions too.

For example, the Unique Users Per Client Per Month question by Microsoft:

Last Updated: March 2021

EasyID 2024

Write a query that returns the number of unique users per client for each month. Assume all events occur within the same year, so only month needs to be be in the output as a number from 1 to 12.

Go to the Question

“Write a query that returns the number of unique users per client per month”

Or maybe a little more complex one by DoorDash:

Write a query that returns the average order cost per hour during hours 3 PM -6 PM (15-18) in San Jose. For calculating time period use 'Customer placed order datetime' field. Earnings value is 'Order total' field. Order output by hour.

One of the useful tools for data scientists is the DISTINCT clause, which selects only unique values. This is helpful when you have to report on events that can happen multiple times on various levels, e.g., a customer, order, day level. It is often used with aggregate functions, that’s why we’ll show it to you in this section.

To solve this Postmates database interview question, you’ll need the DISTINCT clause:

Last Updated: February 2021

EasyID 2013

How many customers placed an order and what is the average order amount?

Go to the Question

How many customers placed an order and what is the average order amount?

Link to the question: https://platform.stratascratch.com/coding/2013-customer-average-orders

Answer:

Missing data

In this query, you’ll use the COUNT() to count the number of customers. You need the DISTINCT keyword to count every customer only once. There’s also an AVG() function for calculating the average order amount.

Here’s the output:

All required columns and the first 5 rows of the solution are shown

countavg
5139.22

2. WHERE Clause

In SQL, the WHERE clause serves as a data filter. Once the filtering criteria are set up using the WHERE clause, the SQL statement will return data that fulfill these criteria.

This Forbes database interview question asks you to filter data in a very straightforward way:

Find companies in the financial sector based on Europe and Asia.

Link to the question: https://platform.stratascratch.com/coding/9678-find-finance-companies-based-in-europe-and-asia

Answer:

SELECT
    company
FROM forbes_global_2010_2014
WHERE 
    (continent = 'Asia' OR continent = 'Europe') AND
    (sector = 'Financials');

You’re working with only one table here. You have to set two criteria using the WHERE clause to get the correct result. The first one is the companies have to be from Europe or Asia. The second one is the company’s sector has to be financial.

Run the code, and you’ll get this output. We’re showing only several first rows here:

Output for Database Interview Question to Find finance companies

The WHERE clause can also be used with the SELECT statement too. One good example of how this is done is the Top Cool Votes database interview question by Yelp:

Last Updated: March 2020

MediumID 10060

Find the review_text that received the highest number of cool votes. Output the business name along with the review text with the highest number of cool votes.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/10060-top-cool-votes

Answer:

SELECT business_name,
             review_text
FROM yelp_reviews
WHERE cool =
    (SELECT max(cool)
     FROM yelp_reviews);

Missing data

This query simply selects all the business and review texts. However, we don’t need all this data, but only the business with the highest number of ‘cool’ votes. The criteria are stated in the WHERE clause, and the businesses with the highest number of ‘cool’ votes are selected with the help of the MAX() aggregate function in the SELECT statement.

And this is the output:

All required columns and the first 5 rows of the solution are shown

business_namereview_text
Roka AkorI hate to admit it, but it had been a long while since my last visit to Roka Akor. I deserve a hand slap. But last week, I had the perfect excuse to p
LunardisThis is the nicest grocery store in the city. I actually met my wife at this grocery store while shopping for avocados.

For practicing the WHERE clause, you can select from numerous StrataScratch questions that test that. It could be a rather simple one, like this one by Ring Central:

Last Updated: January 2021

EasyID 2018

Return a list of users with status free who didn’t make any calls in Apr 2020.

Go to the Question

Or it could be a hard one, as this one by Microsoft:

Last Updated: March 2021

HardID 2029

Select the most popular client_id based on the number of users who individually have at least 50% of their events from the following list: 'video call received', 'video call sent', 'voice call received', 'voice call sent'.

Go to the Question

3. GROUP BY Clause

The following important SQL interview topic for any aspiring data scientist for database interview questions is the GROUP BY clause. Because, aside from aggregating and filtering data, you’ll also have to group it. That’s exactly what the GROUP BY clause does: it groups all the rows with the same value in the column(s) you’re grouping by.

This clause is also often used with the aggregate functions. That’s why it’s important to know both.

To solve the following SQL database interview question from Airbnb, you’ll have to know the aggregate functions, the GROUP BY clause, and also the HAVING clause, which is also common with the GROUP BY:

Last Updated: January 2018

EasyID 9627

Find the average number of beds in each neighborhood that has at least 3 beds in total.

Output results along with the neighborhood name and sort the results based on the number of average beds in descending order.

Go to the Question

Link to this database interview question: https://platform.stratascratch.com/coding/9627-3-bed-minimum

Answer:

SELECT neighbourhood,
       avg(beds) AS n_beds_avg
FROM airbnb_search_details
WHERE neighbourhood IS NOT NULL
GROUP BY neighbourhood
HAVING sum(beds) >= 3
ORDER BY n_beds_avg DESC;

Missing data

This query selects the neighbourhood and calculates the average number of beds using the AVG() aggregate function. To get the average on a neighbourhood level, you need to group data by this column. This database interview question also requires that neighbourhoods shown should have at least three beds in total. To calculate this, you need the SUM() aggregate function put in the HAVING clause.

The HAVING clause is also used for filtering data along with the WHERE clause. The main difference is that WHERE is used before grouping data while HAVING is used after you’ve grouped the data.

Finally, the code above sorts data by the average number of beds in descending order. To do that, you need the ORDER BY clause. This nicely leads us to the fourth important SQL topic for database interview questions.

Before that, here’s the code output:

All required columns and the first 5 rows of the solution are shown

neighbourhoodn_beds_avg
Pacific Palisades6
Windsor Terrace5
Hayes Valley5
Redondo Beach5
Hell's Kitchen4
Rancho Palos Verdes4
Manhattan Beach4
Temple City4
The Rockaways4
Long Beach3.33
Astoria3.33
Westlake3.17
Lincoln Park3
West Los Angeles3
Cow Hollow3
Topanga3
Malibu2.5
2.2
East Village2
Alphabet City2
Loop2
East Harlem1.67
Bedford-Stuyvesant1.67
Mid-Wilshire1.67
Harlem1.58
Studio City1.5
Bernal Heights1.5
Williamsburg1.33
Dorchester1

4. Ranking Rows and LIMIT Clause

Ranking rows is usually done by one of the following window functions:

  • ROW_NUMBER()
  • RANK()
  • DENSE_RANK()

SQL window functions, by definition, are the SQL functions that use the set of data (or window) to perform the calculation. They can seem similar to the aggregate functions, but the main difference is the window functions don’t show the result in one row, but they leave the original data and return results in an additional column beside it.

The Ranking Most Active Guests database interview question by Airbnb is a good example:

MediumID 10159

Identify the most engaged guests by ranking them according to their overall messaging activity. The most active guest, meaning the one who has exchanged the most messages with hosts, should have the highest rank. If two or more guests have the same number of messages, they should have the same rank. Importantly, the ranking shouldn't skip any numbers, even if many guests share the same rank. Present your results in a clear format, showing the rank, guest identifier, and total number of messages for each guest, ordered from the most to least active.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/10159-ranking-most-active-guests

Answer:

SELECT 
    DENSE_RANK() OVER(ORDER BY sum(n_messages) DESC) as ranking, 
    id_guest, 
    sum(n_messages) as sum_n_messages
FROM airbnb_contacts
GROUP BY id_guest
ORDER BY sum_n_messages DESC;

Missing data

This query uses the DENSE_RANK() to rank guests by the sum of number of their messages. Along with rank, the output will show the guest ID and the total number of messages per guest. The result will be shown by the number of messages in descending order.

Here it is:

All required columns and the first 5 rows of the solution are shown

rankingid_guestsum_n_messages
1882f3764-05cc-436a-b23b-93fea22ea84720
162d09c95-c3d2-44e6-9081-a3485618227d20
2b8831610-31f2-4c58-8ada-63b3601ca47617
291c2a883-04e3-4bbb-a7bb-620531318ab117
3bdaf2e68-86dd-40d9-a5a1-9cc95ea25d9116
36133fb99-2391-4d4b-a077-bae40581f92516
4d328a4a0-df5f-4851-a864-a8845b45c1e015
494139517-e7e9-4afb-8b5d-bb6f306bf75115
49b145027-2723-4c5f-988c-b5118c89991215
5136c10f8-af53-4e5a-a5b3-d9c9c495b16613
56c541a87-d864-4de8-ae05-70ae57b5097613
65dacf820-a573-4ce2-9fd8-0d7ed979b82211
68375dde4-0de9-402c-8833-2127c150460111
7b2fda15a-89bb-4e6e-ae81-8b21598e248210
76a8cf97f-8a77-4b55-b79a-b560bca414a410
7bd8f3dd6-fecc-479a-a88f-1d0049600e9f10
8845e3c1d-6c06-4ef2-ade5-7eecefb56fbe9
829ef2346-5fda-45d7-b4ed-eb34967a62b39
85ed478e0-eae9-4537-826a-04ec779c1fd29
8125242de-91b4-43bd-a926-75ae4b7a93229
914f943bb-74e9-458b-be55-203dc72206888
9d418a1ab-b181-40a7-90fe-7216e40dc3548
9aa9cf5bf-5667-4212-8018-1cb8beee530e8
925cfc206-89aa-4e63-b2f4-3cbb8631d9fb8
97e309181-e61e-426b-baef-dd031d5660d38
95c4b64d5-4869-4a02-b36c-c451758559e08
967aece73-e112-4e9e-9e05-8a2a94b003b98
902f0a750-34da-4268-94e8-f1a371f0460e8
9ca6a9321-d7f9-42e9-aeb1-5d1832034da08
970180687-5888-4946-9578-83ab678cb9978
10b6a8b1f9-5b40-4edd-91f3-ee522b0f9e5b7
10673af82f-6a4d-4fd5-8485-4459670da9bc7
10f4602827-8672-4545-8887-43702f8c7ecc7
11eaa8118b-aba6-4c25-96cd-c72f01d6b0636
11fad1a097-a511-4f44-a603-6a271c1f159e6
118af13d02-bd2c-40fd-afd7-e4fe3a6a622e6
128d23aa41-c77c-4bbe-98bc-66abecfefbfe5
12bb490ede-8a70-4d61-a2e8-625855a393e25
12eda96881-7b23-413a-9f6f-b45dc9fe2a5b5
123e6c2466-74fe-44c0-a6f3-dda79755d30a5
12d61d88ae-7910-4832-8116-aab4603ab3b05
12176a2fb8-1de3-483a-92a3-34a826a91f5a5
1272f07b78-99e1-46ac-8452-19dd2053eca35
122889fccc-37ab-4a66-8d64-41b31314c7fc5
124ec2c8ec-9325-4b66-a370-820a56fbd1a35
12d528e24b-7c1f-446f-9bb0-a4ecb77c3acd5
1286b39b70-965b-479d-a0b0-719b195acea25
133e6fbc52-3e54-4a74-9ca1-7ce13048aab74
136cbb33d1-6ecc-4f74-8b6a-a43d07d484b64
13e3cdd90a-8097-4220-9722-2b914d761a844
13679d857b-08b8-4748-b703-86735aa422964
13f54989cf-459b-409d-be6a-9534a53cc4a94
13a09bf912-b21d-4859-b194-8512c30695f64
133c3ec192-acd7-4c91-8e82-18cff7d859ec4
130f4ea370-702c-4f08-a00d-98e67c2a35744
13a1a53da7-7a33-4693-8568-1d7f2cd32e314
13a03f86ca-5c44-4e66-8695-71c380cfc48b4
1370a2c590-f854-4ca6-adf5-03fc081cfddf4
1320418fda-ffb2-46a3-9581-863725497c054
131a66fe1c-fea6-4ec6-96c4-3ea3e0c7815e4
13b91e6323-bd26-4172-9070-1a4993fdc3974
13e0d259e6-1b67-4964-a6dc-e8a8c69d3e014
13a9f7ffb1-4970-492b-b448-4bd9d0fbeef34
13924f864f-db83-4945-9a65-cf42a657ca684
142acf0e64-9da0-47c9-b0f9-1eceab379f843
14f1877c99-430e-41e1-818a-713ff82f09da3
14598f8459-96f6-4400-8bce-1474757dd5893
14ebe81cf8-6037-43f2-81d2-fd386f5da74f3
149a45a950-b4f7-4f16-abe6-f9286abf26413
149e5e2865-f257-4d19-9f79-9388ae925ad73
141759c05e-f12a-4eaa-8059-3bcaca443c883
14c0075f4f-0cb0-489b-9f12-bf1dff5734883
14e8e41881-d3bf-4f1e-b55e-6052eae23ef63
14b8285e97-9f11-4aec-bb05-e21f612ee41f3
14ffc234be-ca13-4bcb-a472-1c136b8008913
15f5ba8f82-682e-4ec9-9c6e-cedc2cd628ca2
15996d1871-8986-481b-8bbf-1a1420c68c902
1515eb6f98-98ee-491e-8eeb-b5e2c61bfd8a2
154de8c8ea-728a-467a-aa69-4800e8880c952
15b3e35a8f-5503-4d78-adb8-7693097dfca72
1567c4e87c-8a8f-4da4-a965-b89c9d74666b2
15425aa1ed-82ab-4ecf-b62f-d61e1848706d2
154995debc-b753-4401-b556-a2bcbb897eba2
1518d343e3-0ef0-4bb7-8cd9-b83d3fbbb82f2
1537a63847-b09a-4f32-81a5-97cfb8e84c6d2
15be8a5e0c-e3fb-4517-a799-e1ed8a05b6e62
15e764ddb5-126e-4474-8e4e-4bdc330b51d12
156cb23070-821b-467d-b4d0-af04645a07032
15eb3500c2-b0b9-4753-8ab3-3fad4d5777f12
150187b377-ad1c-4da4-8a17-568b2c9098f42
15e079182c-7628-4f49-8301-29ff8b4938ec2
150a8e121b-c09c-4de1-abcc-81bce87de29e2
15126ed661-fa20-4041-ac16-ec118bbcce3b2
157424d151-b449-40e1-9e8f-c258289c581c2
16b11cd744-101d-409b-9a55-7e151f2e79d51
16ea445eea-3fac-4edd-b1d6-569f57f6abe41

LIMIT is also used for ranking rows. Its primary function is to limit the number of rows shown in the output. Combine it with ORDER BY, and you have a great tool for showing top N values, which is often in reporting.

When speaking of using LIMIT, this database interview question from Yelp is a good showcase of its use:

MediumID 10053

Find the top 5 businesses with the most check-ins. Output the business id along with the number of check-ins.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/10053-most-checkins

Answer:

SELECT 
    business_id,
    sum(checkins) AS n_checkins
FROM yelp_checkin
GROUP BY 
    business_id
ORDER BY
    n_checkins DESC
LIMIT 5;

Missing data

Again, you have to know aggregate functions here. The SUM() function, more specifically, to sum the number of check-ins. Since you want your data on a business level, you’ll have to use GROUP BY. Then you order data according to the number of check-ins in descending order. To get the top five, you just need to limit the output to the first five rows.

And there it is:

All required columns and the first 5 rows of the solution are shown

business_idn_checkins
4k3RlMAMd46DZ_JyZU0lMg137
TkEMlu88OZn9TKZyeY9CJg39
4p6Wce7Ed707QS2-yQkvZw31
Ehy00JWQixgoXzisVKhvag22
ujgpePdD8Q-fP1mPFnw0Qw17

5. Subqueries and CTEs

For more complex queries, it’s often necessary to know subqueries or CTEs, ideally both.

A subquery is a query within a larger query. It can usually be found in another SELECT statement or in the WHERE clause.

Let’s see how subqueries work by answering the database interview question from Spotify. Aside from subqueries, you also need to know aggregate functions, the WHERE, GROUP BY, and ORDER BY clauses.

MediumID 9743

Find the number of unique songs of each artist which were ranked among the top 10 over the years. Order the result based on the number of top 10 ranked songs in descending order.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/9743-top-10-songs

Answer:

SELECT 
    artist, 
    count(distinct song_name) AS top10_songs_count
FROM
    (SELECT 
        artist,
        song_name
     FROM billboard_top_100_year_end
     WHERE
        year_rank <= 10
    ) temporary
GROUP BY
    artist
ORDER BY
    top10_songs_count DESC;

Missing data

This query selects artists and counts the number of distinct songs. It does that from a subquery. This subquery searches for songs that were among the top 10 over the years. After that, the result is grouped on an artist level and sorted by the number of distinct songs in descending order.

If you like Elvis Presley, you’ll be happy to see him on top of our result:

All required columns and the first 5 rows of the solution are shown

artisttop10_songs_count
Elvis Presley9
Mariah Carey7
Usher6
Whitney Houston5
Beatles5
Elton John5
Toni Braxton4
Boyz II Men4
TLC4
Madonna4
Rod Stewart4
Bee Gees4
Katy Perry4
Paul McCartney4
Michael Jackson4
Beyonce4
Rihanna3
Paula Abdul3
Lady Gaga3
Diana Ross3
Ludacris3
Gwen Stefani3
Lionel Richie3
Ace Of Base3
The Black Eyed Peas3
Nelly3
Kelly Clarkson3
En Vogue3
Andy Gibb3
Dean Martin3
John Oates3
Maroon 53
Alicia Keys3
Temptations3
matchbox twenty3
Daryl Hall3
Kanye West3
Taio Cruz2
Color Me Badd2
Akon2
Captain and Tennille2
Bette Midler2
Perry Como2
Bobby Vinton2
Phil Collins2
Timbaland2
John Cougar2
Rolling Stones2
Ashanti2
Mac Davis2
Macklemore2
Puff Daddy2
T-Pain2
Mama's and The Papa's2
Train2
Association2
Wham!2
Justin Timberlake2
Righteous Brothers2
Wings2
R. Kelly2
Celine Dion2
Roberta Flack2
Everly Brothers2
50 Cent2
Barbra Streisand2
John Denver2
Monica2
Roy Orbison2
Ryan Lewis2
Monkees2
Beach Boys2
Savage Garden2
Mr. Mister2
Sean Paul2
Fergie2
Fleetwoods2
Nelly Furtado2
Flo Rida2
Chicago2
Simon and Garfunkel2
Bobby Darin2
Nicki Minaj2
Jackson 52
Bruno Mars2
Four Tops2
Olivia Newton-John2
Sly and The Family Stone2
Snoop Dogg2
Chris Brown2
Frankie Valli2
Janet2
OutKast2
Pat Boone2
Janet Jackson2
Christina Aguilera2
Chubby Checker2
Jay-Z2
Supremes2
Donna Summer2
Survivor2
Ciara2
Bryan Adams2
Fifth Dimension1
Florida Georgia Line1
Foreigner1
Foundations1
Four Seasons1
Frankie Avalon1
Frank Sinatra1
Freda Payne1
Freddy Fender1
fun.1
Gale Garnett1
Gary Wright1
George Harrison1
George Michael1
Gilbert O'Sullivan1
Glen Campbell1
Gloria Gaynor1
Gnarls Barkley1
Gogi Grant1
GoonRock1
Gotye1
Grand Funk Railroad1
Green Day1
Gregory Abbott1
Guess Who1
Guns N' Roses1
Guy Mitchell1
Hayley Williams1
Heart1
Herb Alpert1
Herman's Hermits1
Hi-Five1
Highwaymen1
Hoobastank1
Hot1
Hugo Montenegro1
Human League1
Imagine Dragons1
Impressions1
INXS1
Irene Cara1
James Blunt1
James Ingram1
Jamie Foxx1
Janelle Monae1
Ja Rule1
Jason Mraz1
Jennifer Lopez1
Jewel1
J. Frank Wilson and The Cavaliers1
J. Geils Band1
Jim Croce1
Jimmy Dorsey1
Jimmy Gilmer and The Fireballs1
Jimmy Jones1
Jim Reeves1
Jive Five1
Joan Jett and The Blackhearts1
Joe Dowell1
Joe Tex1
John Lennon1
Johnnie Taylor1
Johnny Horton1
Johnny Preston1
Johnny Rivers1
Jon Bon Jovi1
Jon Secada1
Jordin Sparks Duet With Chris Brown1
Kay Starr1
K-Ci and JoJo1
Ke$ha1
Keith Sweat1
Kelly Rowland1
Kenny Loggins1
Kenny Nolan1
Kenny Rogers1
Kid Rock1
Kiki Dee1
Kimbra1
Kim Carnes1
Klymaxx1
Knack1
Kool and The Gang1
Krayzie Bone1
Kris Kristofferson1
Kris Kross1
Lady Antebellum1
Lauren Bennett1
LeAnn Rimes1
Lenny Kravitz1
Leona Lewis1
Les Baxter1
Lifehouse1
Lil Jon1
Lil Wayne1
Linkin Park1
Lipps, Inc.1
Lisa Loeb1
Little Eva1
Little Stevie Wonder1
Lloyd Price1
LMFAO1
Lonestar1
Los Del Rio1
Louis Armstrong1
Love Unlimited Orchestra1
Lulu1
Manhattans1
Mario1
Mario Winans1
Mark Dinning1
Mark Morrison1
Marvin Gaye1
Mary MacGregor1
Mary Wells1
Melanie1
Men At Work1
MFSB1
Michael McDonald1
Michael Sembello1
Milli Vanilli1
Missy Elliott1
Montell Jordan1
Mr. Acker Bilk1
Nancy Sinatra1
Natasha Bedingfield1
Nayer1
Neil Sedaka1
Nelson Riddle1
Next1
Ne-Yo1
Nickelback1
Nilsson1
Nine Stories1
Olivia1
One Direction1
OneRepublic1
Osmonds1
Otis Redding1
Patsy Cline1
Patti Austin1
Patti Labelle1
Paula Cole1
Paul and Paula1
Paul Anka1
Paul Mauriat1
Paul McCoy1
Paul Simon1
P. Diddy1
Peaches and Herb1
Percy Faith1
Perez Prado1
Petey Pablo1
Petula Clark1
Pharrell1
Pink Floyd1
Pitbull1
Plain White T's1
Platters1
Player1
Poison1
Police1
Prince1
Puddle Of Mudd1
Queen1
Raiders1
Rare Earth1
Rascals1
Ray Charles1
Ray Dalton1
Ray Parker Jr.1
Ray Stevens1
Real McCoy1
Redbone1
Red Hot Chili Peppers1
REO Speedwagon1
Rick Astley1
Rick Springfield1
Ricky Martin1
Rita Coolidge1
Robbie Nevil1
Robert John1
Robert Palmer1
Robin Thicke1
Roxette1
Sammy Davis Jr.1
Sam The Sham and The Pharaohs1
Sara Bareilles1
Seal1
Sensations1
Sgt. Barry Sadler1
Shai1
Shakira1
Shania Twain1
Shelley Fabares1
Sheryl Crow1
Shirelles1
Silk1
Sinead O'Connor1
Sir Mix-a-Lot1
Sixpence None The Richer1
Skeeter Davis1
Sleepy Brown1
Slim Thug1
Snow1
Sonny James1
Spice Girls1
Starship1
Static Major1
Steve Miller Band1
Steve Winwood1
Stevie Wonder1
Sting1
String-a-longs1
Sugar Ray1
Surface1
SWV1
Tab Hunter1
Tag Team1
Taylor Swift1
Tears For Fears1
Terror Squad1
Terry Jacks1
The All-American Rejects1
The Calling1
Thelma Houston1
The Miracles1
The Pussycat Dolls1
The Wanted1
Three Dog Night1
T.I.1
Tiffany1
Timmy T1
Tina Turner1
Tom Jones1
Tommy Edwards1
Tommy James and The Shondells1
Tommy Roe1
Tony Orlando and Dawn1
Tony Rich Project1
Tracy Chapman1
Turtles1
UB401
Vanessa Carlton1
Vanessa Williams1
Van Halen1
Village People1
Village Stompers1
Walter Murphy and The Big Apple Band1
Wanz1
Wayne Newton1
We Five1
Whitesnake1
Wilbert Harrison1
Wild Cherry1
will.i.am1
Will To Power1
Wilson Phillips1
Wiz Khalifa1
Wreckx-N-Effect1
Wyclef Jean1
Yes1
Young Jeezy1
Young Rascals1
3 Doors Down1
Yung Joc1
Aaliyah1
Adele1
Afrojack1
a-ha1
Alan O'Day1
All-4-One1
Amy Grant1
? and The Mysterians1
Anita Baker1
Anita Ward1
Archie Bell and The Drells1
Archies1
A Taste Of Honey1
Baauer1
Bangles1
Belinda Carlisle1
Bell Biv Devoe1
Bill Withers1
Billy Idol1
Billy Joel1
Billy Preston1
Billy Vaughn1
B.J. Thomas1
Blondie1
B.o.B1
Bobby Brown1
Bobby Gentry1
Bobby Goldsboro1
Bobby Lewis1
Bob Seger1
Bone Thugs-N-Harmony1
Bon Jovi1
Bonnie Tyler1
Box Tops1
Brandy and Monica1
Breathe1
Brenda Lee1
Britney Spears1
Browns1
Bruce Hornsby and The Range1
Busta Rhymes1
Calvin Harris1
Carly Rae Jepsen1
Carly Simon1
Carole King1
Carpenters1
Carrie Underwood1
Cascades1
C+C Music Factory1
Cee Lo Green1
Chaka Khan1
Chamillionaire1
Champs1
Cher1
Chic1
Chiffons1
Chingy1
Colby O'Donis1
Commodores1
Coolio1
Cream1
Culture Club1
Daniel Powter1
David Bowie1
David Rose1
David Seville1
Dawn1
Debby Boone1
Deborah Cox1
Dee Clark1
Dee Dee Sharp1
Del Shannon1
Destiny's Child1
Diamonds1
Dido1
Dionne and Friends1
Dire Straits1
Dolly Parton1
Domenico Modugno1
Don McLean1
Donna Lewis1
Donny Osmond1
Doors1
Doris Day1
Eagles1
Earth, Wind and Fire1
Eddie Murphy1
Eddie Rabbitt1
Edwin Starr1
Ellie Goulding1
EMF1
Eminem1
Emotions1
Enya1
Eric Clapton1
Eurythmics1
Evanescence1
Eve1
Exile1
Extreme1
Faith Evans1
Fat Joe1

CTE stands for Common Table Expression, and it has a similar function to subquery. They are used to write more complex queries and to translate calculation logic that has several steps to a database language.

This database interview question by Yelp is a good start for showing you how CTE works:

MediumID 10148

Find the top 5 cities with the highest number of 5-star businesses.

The output should include the city name and the total count of 5-star businesses in that city, considering both open and closed businesses. If two or more cities have the same number of 5-star businesses, assign them the same rank, and skip the next rank accordingly. For example, if two cities tie for 1st place, the following city should be ranked 3rd.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/10148-find-the-top-10-cities-with-the-most-5-star-businesses

Answer:

WITH cte_5_stars AS
  (SELECT city,
          count(*) AS count_of_5_stars,
          rank() over(
                      ORDER BY count(*) DESC) AS rnk
   FROM yelp_business
   WHERE stars = 5
   GROUP BY 1)
SELECT city,
       count_of_5_stars
FROM cte_5_stars
WHERE rnk <= 5
ORDER BY count_of_5_stars DESC;

Missing data

Like any CTE, this one is initiated by the keyword WITH. The SELECT statement select cities with five stars, then it counts them and ranks them. The next SELECT statement references the CTE to select only those cities where their rank is five or lower, i.e., outputs the top five cities.

These are the top cities:

All required columns and the first 5 rows of the solution are shown

citycount_of_5_stars
Phoenix5
Toronto4
Las Vegas4
Edinburgh2
Gilbert2
Urbana2
Scottsdale2
Madison2

6. JOINs

For anyone who wants to use more than one table in a database, which is probably most users, knowing JOINs is essential. It’s due to the database logic, where the database is normalized, and data is separated in multiple tables.

One of the data scientists’ main jobs is to work with raw data and transform it into formats suitable for others. To do that, they need to combine multiple tables. Without JOINs, they wouldn’t be able to do that. If you’re not sure about the difference between various data science jobs, this guide through 14 different data science job positions will make everything clear.

This Amazon database interview question tests JOINs nicely. Along with that, you’ll also have to show knowledge of aggregate functions, WHERE, GROUP BY, and HAVING clauses, as well as subqueries.

Last Updated: May 2019

MediumID 9915

Find the customers with the highest daily total order cost between 2019-02-01 and 2019-05-01. If a customer had more than one order on a certain day, sum the order costs on a daily basis. Output each customer's first name, total cost of their items, and the date. If multiple customers tie for the highest daily total on the same date, return all of them.

For simplicity, you can assume that every first name in the dataset is unique.

Go to the Question

Link to this database interview question: https://platform.stratascratch.com/coding/9915-highest-cost-orders

Answer:

SELECT first_name,
       sum(total_order_cost) AS total_order_cost,
       order_date
FROM orders o
LEFT JOIN customers c ON o.cust_id = c.id
WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1'
GROUP BY first_name,
         order_date
HAVING sum(total_order_cost) =
  (SELECT max(total_order_cost)
   FROM
     (SELECT sum(total_order_cost) AS total_order_cost
      FROM orders
      WHERE order_date BETWEEN '2019-02-1' AND '2019-05-1'
      GROUP BY cust_id,
               order_date) b);

Missing data

The query uses the SUM() aggregate function to calculate the total order cost. To get all data you need, you have to LEFT JOIN two tables. Data is filtered on order date using the WHERE clause. Next, data is grouped by the customer’s first name and order date. You need to output the customer with the highest daily total order. To do that, you need the HAVING clause to get data where the sum of the order costs per customer and per date is equal to the order maximum. This is where you need another aggregate functions, which is MAX().

Here’s the output:

All required columns and the first 5 rows of the solution are shown

first_nametotal_order_costorder_date
Jill2752019-04-19
Mark2752019-04-19

Check out our post "How to Join 3 or More Tables in SQL" where we talk about using this commonly required SQL concept.

7. Data Organizing and Pattern Matching

One of the data scientists' jobs is to organize, clean, and analyze data. During this analysis, they’ll usually try to find some patterns in the data. You might ask "isn’t that something a data engineer does?". Well, yes and no. To understand the difference between data scientists and data engineers, have a look at the blog post comparing these two careers.

One good example of covering organizing data is this database interview question by the City of San Francisco:

Last Updated: May 2018

MediumID 9726

Classify each business as either a restaurant, cafe, school, or other.

• A restaurant should have the word 'restaurant' in the business name. This includes common international or accented variants, such as “restaurante”, “restauranté”, etc. • A cafe should have either 'cafe', 'café', or 'coffee' in the business name. • A school should have the word 'school' in the business name. • All other businesses should be classified as 'other'. • Ensure each business name appears only once in the final output. If multiple records exist for the same business, retain only one unique instance.

The final output should include only the distinct business names and their corresponding classifications.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/9726-classify-business-type

Answer:

SELECT distinct business_name,
       CASE
           WHEN business_name ilike any(array['%school%']) THEN 'school'
           WHEN lower(business_name) like any
(array['%restaurant%']) THEN 'restaurant'
           WHEN lower(business_name) like any
(array['%cafe%', '%café%', '%coffee%']) THEN 'cafe'
           ELSE 'other'
       END AS business_type
FROM sf_restaurant_health_violations;

Missing data

This question requires you to clean data and organize it into business types. There are specific criteria needed for that, as stated in the CASE WHEN statement.

Your organized data will look like this:

All required columns and the first 5 rows of the solution are shown

business_namebusiness_type
Starbucksother
TAWAN'S THAI FOODother
Burger King 4525other
Tacos San Buenaother
BLOWFISH SUSHIother
Tai Hing Inc.other
ROYAL GROUND COFFEEcafe
China Fun Expressother
Home Plateother
Samiramis Importsother
Modern Thai Inc.other
WING HING RESTAURANTrestaurant
Salem Groceryother
DENMAN MIDDLE SCHOOLschool
Sam Rong Cafecafe
Hilton Financial District- Restaurant Seven Fiftyrestaurant
Batter Bakeryother
PRESIDIO THEATREother
Jay's Cheesesteakother
Extreme Pizzaother
Events Management @ Legion of Honorother
Contradaother
The Castro Republicother
GOLDEN PRODUCEother
Subway #36339other
Cathead's BBQother
The Lord Georgeother
LOS PANCHOSother
Golden Wokother
TSING TAO RESTAURANTrestaurant
Starbucks Coffee Cocafe
IRVING PIZZAother
Pho Expressother
Wines of California Wine Barother
Pollo Camperoother
Del Popolo LLCother
Gateway High/Kip Schoolsschool
Carbon Grillother
Rock Japanese Cuisineother
Howard & 6th Street Food Market Inc.other
Poke Kanaother
Castro Street Chevronother
Champa Gardenother
Harvest Urban Marketother
ABSINTHE PASTRYother
STARBUCKS COFFEE CO. #603cafe
Earthbarother
Miller's East Coast Deliother
Annie's Hot Dogs & Pretzelsother
Let's Be Frankother
T & L FOOD MARKETother
Westfield Food Court Sculleryother
Ramzi's Cafecafe
Andersen Bakeryother
MONGKOK DIM SUM & RESTAURANTrestaurant
Laguna Cafécafe
Old Siam Thai Restaurantrestaurant
PASITA'S BAKERYother
T & L Liquor Store Inc.other
Mixt Greensother
India Clay Oven Restaurant and Barrestaurant
Duboce Park Cafecafe
Roxanne Cafecafe
Dip, LLCother
Iza Ramenother
Rotee Expressother
Cafe Insalatacafe
Azalina'sother
Chez Julienother
A La Turcaother
Tropisueñoother
Belly Burgerother
Souvlaother
House of Xian Dumplingother
Pica Picaother
Hans Coffee Shopcafe
Seal Rock Inn Restaurantrestaurant
Cadillac Marketother
Crepe and Brioche, Inc.other
LA VICTORIA BAKERYother
Park Gyros Castroother
Strawother
Tenderloin Market & Deliother
Prospectother
Juice Crazeother
Bubble Cafecafe
JAVA ON OCEANother
Rico Panother
AK SUBSother
Wing Lum Cafecafe
PANCHO'Sother
Boos Voniother
Tacoliciousother
VIP Coffee & Cake Shopcafe
A Manoother
Creamother
7-Eleven, Store 2366-21389Fother
Heritageother
Rico Pan Bakeryother
Fair Trade Cafe LLCcafe
Dim Sum Bistroother
Live Oak Schoolschool
Peet's Coffee & Teacafe
Nabeother
The AA Bakery & Cafecafe
Koja Kitchen CA01other
City Superother
California Pizza Kitchen, Inc.other
Glaze Teriyakiother
John Chin Elementary Schoolschool
Soo Fong Restaurantrestaurant
Stanford Court Hotelother
MARTHA & BROS. COFFEE COcafe
Yerba Buena Tea Co (formerly Tea Smiths of SF)other
Brothers Restaurantrestaurant
MV Taurusother
Marina Meats Inc.other
JIM'S RESTAURANTrestaurant
Brendas Meat & Threeother
Bayshore Taqueriaother
Roadside Rosy'sother
Coffee Cultures SOMAcafe
North Point Marketother
MARTIN L. KING MIDDLE SCHOOLschool
Minna SF Group LLCother
Cabinother
L & G Vietnamese Sandwichother
Taco Bell Cantina #31685other
DONA TERE'S MARKETother
Pabuother
Jerseyother
Ninki Sushi Bar & Restaurantrestaurant
BALBOA HIGH SCHOOLschool
Escape From New York Pizzaother
David's Deli & Bistroother
Fresca Gardens, Incother
ITALIAN AMERICAN SOCIAL CLUBother
95117 Premium Commissary Roomother
MANIVANH THAI RESTAURANTrestaurant
SAKANA BUNE RESTAURANTrestaurant
Big Fish Little Fish Pokeother
Cecilia's Pizza & Restaurantrestaurant
Clay Oven Indian Cuisineother
Da Cafecafe
Sushi Honother
King of Thai Noodle Houseother
Little Vietnam Cafecafe
IL BORGOother
Jackson Fillmore Trattoriaother
NORTH BEACH PIZZAother
S & T Hong Kong Seafoodother
La Quinta Restaurantrestaurant
Maggie Cafecafe
Golden Natural Foodsother
Akira Japanese Restaurantrestaurant
Buckhorn Grillother
Taqueria Dos Charrosother
Wing Lee BBQ Restaurantrestaurant
Man Sung Companyother
Subway 30303other
Sutter Pub and Restaurantrestaurant
Pho Huynh Sangother
HAMANO SUSHIother
S. F. Gourmet Hot Dog Cartother
CLEMENT BBQ RESTAURANTrestaurant
Urban Puttother
AT&T - COMMISARY KITCHEN [145184]other
West Coast Wine & Cheeseother
Dolores Park Outpostother
NEW EMMY'S RESTAURANTrestaurant
SRI THAI CUISINEother
Restaurante Montecristorestaurant
Great Eastern Restaurantrestaurant
Yummy Sticksother
Hot Pot Islandother
King of Thai Noodles Cafecafe
AT&T Park - Coffee and Ice Cream (5A+5B)cafe
Jiang Ling Cuisine Restaurantrestaurant
SF BAGEL CO. (KATZ BAGELS)other
San Francisco Marriott Union Square - Main Kitchenother
SUBWAY #31419other
Roma Pizzeriaother
Lollipotother
Boss Supermarketother
Mizutani Sushi Barother
Jane the Bakeryother
YUMMA'S MED GRILLother
The Salvation Armyother
Morning Brew Cafecafe
Shareteaother
Old Blueother
Elephant Sushiother
CALIFORNIA PACIFIC MEDICAL CENTERother
Crepe Cafecafe
My Ivy Corp.other
Tanuki Restaurantrestaurant
Antonelli Brothers Meat, Fish, and Poultry Inc.other
Cafe Beancafe
Ha Nam Ninh Restaurantrestaurant
Project Juiceother
New Regent Cafecafe
SENIORE'S PIZZAother
Cafe Bakerycafe
MICADO RESTAURANTrestaurant
PIZZA HUT #758280other
Angel Cafe and Delicafe
24 Hour Fitness Club, #273other
SAFEWAY STORE #964other
General Nutrition #302other
Cafe Fiorecafe
Bellissimo Pizzaother
24th and Folsom Eateryother
Starbucks Coffeecafe
Rusty's Southern LLCother
Mi Yucatanother
CHA-AM RESTAURANTrestaurant
Allstars Cafe Inccafe
Veraci Pizzaother
Milkbomb Ice Creamother
Chowdersother
House of Bagelsother
New Luen Sing Fish Marketother
Dragon Beauxother
Panuchosother
SH Dream Incother
The Willowsother
Hook a Cookother
Thai Cottage Restaurantrestaurant
Blue Bottle Coffeecafe
Golden Kim Tar Restaurantrestaurant
Hong Kong Clay Pot City Restaurantrestaurant
Chinatown Restaurantrestaurant
Bebebar Juice & Sandwichother
Kuma Sushi + Sakeother
EL POLLO SUPREMOother
Francisco Middle Schoolschool
SOother
Tupeloother
TAQUERIA EL BUEN SABORother
Split Breadother
Pho Luen Fat Bakery & Restaurantrestaurant
Pectopah LLCother
PEKING WOK RESTAURANTrestaurant
Castagnola's Restaurantrestaurant
Red Jade Restaurantrestaurant
The Grove - Design Districtother
Expressions Snack Barother
Dragoneatsother
Cafe Broadwaycafe
Toy Boat Dessert Cafecafe
Kate O'Brien'sother
SEGAFREDOother
Hong Kee & Kimother
The Binderyother
Surisanother
J.B.'S PLACEother
Luke's Local Inc.other
Floresother
Southern Comfort Kitchenother
The Good Life Groceryother
Keep It, Inc.other
LA ALTENAother
Quicklyother
Bursaother
Washington Bakery & Restaurantrestaurant
In-N-Out Burgerother

Once the data is organized, you’ll need to find patterns in it. One good example for practicing this is the Meta/Facebook interview question:

Find how the number of `likes` are increasing by building a `like` score based on `like` propensities. A `like` propensity is defined as the probability of giving a like amongst all reactions, per friend (i.e., number of likes / number of all reactions).
Output the average propensity alongside the corresponding date and poster. Sort the result based on the liking score in descending order.
In `facebook_reactions` table `poster` is user who posted a content, `friend` is a user who saw the content and reacted. The `facebook_friends` table stores pairs of connected friends.”

Link to the question: https://platform.stratascratch.com/coding/9775-liking-score-rating

Answer:

WITH p AS
  (SELECT SUM(CASE
                  WHEN reaction = 'like' THEN 1
                  ELSE 0
              END)/COUNt(*)::decimal AS prop,
          friend
   FROM facebook_reactions
   GROUP BY 2)
SELECT date_day,
       poster,
       avg(prop)
FROM facebook_reactions f
JOIN p ON f.friend= p.friend
GROUP BY 1,
         2
ORDER BY 3 DESC;

The first concept here is the CTE which is initiated using the keyword WITH. It allocates the value 1 to every like, then sums the likes, and divides the sum by the number of all reactions. Data is then grouped by a friend. This is how we get the propensity defined in the question.

The following SELECT statement uses the CTE to calculate the average propensity using the AVG() functions. To do that, you need to JOIN two tables, group data by date and poster, and order it by propensity in descending order.

Your output should be this:

Output for Database Interview Question for Liking Score Rating

Conclusion

In total, we’ve covered seven SQL topics tested in database interview questions:

  1. Aggregate functions and the DISTINCT clause
  2. WHERE clause
  3. GROUP BY clause
  4. Ranking rows and LIMIT clause
  5. Subqueries and CTEs
  6. JOINs
  7. Data organizing and pattern matching

The SQL database interview questions heavily test all seven topics. You should always expect these topics because you’ll have to use them extensively if you want to work in data science.

Even though all the topics are a must, some topics are more important to specific jobs than others. Data scientists, data engineers, and data analysts will all have to know subqueries and CTEs, JOINs, and all other functions used for organizing data and finding patterns in it.

So don’t avoid these topics. But if you want to be a data engineer, you should put even more emphasis on JOINs. The same goes with data scientists and data analysts placing emphasis on all SQL concepts that are used in organizing data and finding patterns.
The best advice is to go through all these topics to make sure you have a firm grasp of the most tested SQL concepts in the job interviews. Some additional SQL questions that you must prepare are covered by the article "SQL interview questions".

Share