SQL Interview Questions for the Data Analyst Position

SQL Skills a Data Analyst Must Know

During the data analyst interview process, you’ll more than likely encounter technical problems where you will be expected to answer SQL interview questions. It’s crucial to prepare for your interviews by practicing similar data analyst interview questions in advance. For context, you may be asked questions related to:

Using WHERE vs HAVING()
Performing JOINS and UNIONS
Finding Unique Data Values
Aggregation Functions
Ordering Results
And Other General SQL Skills

As such, we’re going to cover three data analyst SQL interview questions which clearly display these concepts, so you can internalize and better implement them.

Three SQL Interview Questions a Data Analyst Should Be Able to Solve

Data Analyst SQL Interview Question #1: Filtering Data Using WHERE or HAVING()

The data analyst interview question we’re going to look at to understand why to use WHERE or HAVING() comes from a Zillow interview.

Cities With The Most Expensive Homes

Interview Question Date: December 2020

Zillow

MediumID 10315

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Write a query that identifies cities with higher than average home prices when compared to the national average. Output the city names.

Table: zillow_transactions

Link to this data analyst SQL interview question: https://platform.stratascratch.com/coding/10315-cities-with-the-most-expensive-homes

It asks to find all cities with higher average home prices than the national average home price. For this problem, we only have one table zillow_transactions to examine.

Table: zillow_transactions

id	state	city	street_address	mkt_price
1	NY	New York City	66 Trout Drive	449761
2	NY	New York City	Atwater	277527
3	NY	New York City	58 Gates Street	268394
4	NY	New York City	Norcross	279929
5	NY	New York City	337 Shore Ave.	151592

We’ll break this down into two steps - first let’s find the average home price by city. For this, we’ll use one of the SQL Aggregate Functions - "avg()", and GROUP BY cities.

SELECT city,
       avg(mkt_price)
FROM zillow_transactions
GROUP BY city

The next logical step is to limit this data to only the cities with average home prices larger than the national average. It’s at this point we must understand the distinction between the WHERE and HAVING() clause. WHERE applies before the calculation of city averages in our query above, so, if we compared our city averages to national averages in a WHERE clause, it’s removing individual home data before the city average calculation.

Whereas, if we use a HAVING() clause, we filter data after our city average calculation and can perform a proper comparison with the national average. Additionally, the HAVING() clause lets us include comparisons, subqueries, and other SQL code, so we can perform our national average calculation and comparison inside of it.

SELECT city
FROM zillow_transactions
GROUP BY city HAVING(avg(mkt_price) >
                       (SELECT avg(mkt_price)
                        FROM zillow_transactions))

All required columns and the first 5 rows of the solution are shown

city
Mountain View
San Francisco
Santa Clara

As a data analyst, you must understand the difference between the WHERE and HAVING() clauses since you’ll in many cases need to filter out data and want to avoid discarding relevant data.

Data Analyst SQL Interview Question #2: How to Aggregate Distinct Data Values

The question we’re going to look at to understand why we must use unique values comes from a Postmates interview.

Customer Average Orders

Interview Question Date: February 2021

Postmates

EasyID 2013

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

How many customers placed an order and what is the average order amount?

Table: postmates_orders

Link to the question: https://platform.stratascratch.com/coding/2013-customer-average-orders

This question is relatively easy as it asks us to find how many customers placed an order as well as the average order amount. We only have one table postmates_orders to examine.

Table: postmates_orders

id	customer_id	courier_id	seller_id	order_timestamp_utc	amount	city_id
1	102	224	79	2019-03-11 23:27:00	155.73	47
2	104	224	75	2019-04-11 04:24:00	216.6	44
3	100	239	79	2019-03-11 21:17:00	168.69	47
4	101	205	79	2019-03-11 02:34:00	210.84	43
5	103	218	71	2019-04-11 00:15:00	212.6	47

Looking at the example table, we can see how we have the same customer_id ‘102’ applying to more than one order. As a result, finding how many customers placed an order isn’t as simple as counting all the rows. Rather, we must use the DISTINCT clause in our selection to count customer_ids only once.

As for finding the average order values, we can do this easily with the average aggregation function. Unlike the previous question, we don’t have to GROUP BY any column here since we want the average for the whole table (a single value).

SELECT count(DISTINCT customer_id),
       avg(amount)
FROM postmates_orders

All required columns and the first 5 rows of the solution are shown

count	avg
5	139.224

It’s important to understand the proper use of DISTINCT. DISTINCT is looking for unique combinations of row data for the columns you specify. If DISTINCT is within the count function with only the customer_id following it, it’s going to disregard customer_ids which appear more than once before the aggregation. If it’s outside the count function after SELECT instead, it’s going to look for unique combinations of the count and average after the aggregation instead of before, so you end up with incorrect results.

Being able to understand how the DISTINCT clause works in conjunction with aggregation functions is critical for a data analyst since you’ll often have to run calculations on data featuring duplicates or multiple rows for a single field.

Data Analyst SQL Interview Question #3: How to Join and Aggregate Data then Order Results

The last data analyst SQL interview question we’ll look at shows how we can join and aggregate data then order results and comes from an Airbnb interview question.

Find the total number of available beds per hosts' nationality

Airbnb

MediumID 10187

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find the total number of available beds per hosts' nationality. Output the nationality along with the corresponding total number of available beds. Sort records by the total available beds in descending order.

Tables: airbnb_apartments, airbnb_hosts

Link to this data analyst SQL interview question: https://platform.stratascratch.com/coding/10187-find-the-total-number-of-available-beds-per-hosts-nationality

This data analyst SQL interview question is asking us to return the total comments received by a user in the last 30 days while excluding users who didn’t receive comments. There are two tables to analyze here:

Table: airbnb_apartments

host_id	apartment_id	apartment_type	n_beds	n_bedrooms	country	city
0	A1	Room	1	1	USA	New York
0	A2	Room	1	1	USA	New Jersey
0	A3	Room	1	1	USA	New Jersey
1	A4	Apartment	2	1	USA	Houston
1	A5	Apartment	2	1	USA	Las Vegas

Table: airbnb_hosts

host_id	nationality	gender	age
0	USA	M	28
1	USA	F	29
2	China	F	31
3	China	M	24
4	Mali	M	30

Given we have to output the nationality along with the total number of available beds, we already know we’ll need to join the tables. In this case, since there is a match in the data for all the host_ids for the two tables, we can use an INNER JOIN. OUTER JOINs are typically used when there is some dissimilar data which we don’t have in this problem. If, for example, the airbnb_hosts table was missing some host_ids, then we would use an OUTER JOIN.

SELECT n_beds,
       nationality
FROM airbnb_hosts h
INNER JOIN airbnb_apartments a ON h.host_id = a.host_id

The INNER JOIN is extremely simple - only requiring us to provide aliases for the two tables and then join them ON their matching field host_id. The final step in this data analyst SQL interview question is to aggregate the number of beds and order them. We’ll use the SUM() aggregation function and GROUP BY nationality to sum by nationality. Finally, we’ll ORDER BY a descending count of the sum of beds.

SELECT nationality,
       SUM(n_beds) AS total_beds_available
FROM airbnb_hosts h
INNER JOIN airbnb_apartments a ON h.host_id = a.host_id
GROUP BY nationality
ORDER BY total_beds_available DESC

All required columns and the first 5 rows of the solution are shown

nationality	total_beds_available
Luxembourg	320
USA	248
Brazil	72
China	44
Mali	28

Knowing how to JOIN tables in SQL is essential for a data analyst as you might often be working with several different datasets you need to combine before analysis. Keep in mind there are several types of SQL JOINs and which one you must use depends on how well your tables match up! Also, for best presenting and ranking your results, you’ll need to understand how to use the ORDER BY clause.

To practice more such JOIN interview questions, check out our post "SQL JOIN Interview Questions".

Conclusion

In this article, we went over several examples which highlight some of the most important SQL concepts you’ll see in Data Analyst SQL Interview Questions. Ultimately, there are almost endless SQL skills you’d need to master the language, but a good grasp of the concepts explained in this article will prepare you for a large amount of SQL interview questions you might have to solve during an interview.

Keep in mind SQL isn’t the only concept you’ll need to know for the Data Analyst interview, but it is one of the most important! On the StrataScratch platform, you can practice answering more data analyst interview questions. There you can build out your solutions independently or by using one of our articles as a guide. Beyond this, you’ll have access to a large pool of other members who post their solutions for feedback and can critique yours.

SQL Interview Questions for the Data Analyst Position

Categories

SQL Skills a Data Analyst Must Know

Three SQL Interview Questions a Data Analyst Should Be Able to Solve

Data Analyst SQL Interview Question #1: Filtering Data Using WHERE or HAVING()

Cities With The Most Expensive Homes

Data Analyst SQL Interview Question #2: How to Aggregate Distinct Data Values

Customer Average Orders

Data Analyst SQL Interview Question #3: How to Join and Aggregate Data then Order Results

Find the total number of available beds per hosts' nationality

Conclusion

Latest Posts:

Mastering Python Class Methods: A Practical Guide

Python String Methods: Here is How to Master Them

What Does a Machine Learning Engineer Do?

Share

Follow

Categories