SQL Interview Questions for the Data Analyst Position

SQL Interview Questions for the Data Analyst Position
Categories


Let's go over several examples which highlight some of the most important SQL concepts you’ll see in the SQL interview questions for Data Analyst position.

The interview process for data analysts can be intimidating for those new to the position. The data analyst interviews typically cover a wide range of skills involving SQL, relational databases, statistics, probabilities, and practical problem-solving. While the skills which companies require differs from industry to industry and company to company, being proficient in SQL is typically mandatory. As such, a strong SQL ability will always position you well as a candidate for data analyst positions. One of the best ways to improve your SQL skills for interview questions is to leverage the interactive StrataScratch platform to practice real SQL interview questions and engage with a community of other data analysts.

In this article, we’ll cover some of the most important SQL concepts by working through several SQL interview questions for data analyst position.

SQL Skills a Data Analyst Must Know

During the data analyst interview process, you’ll more than likely encounter technical problems where you will be expected to answer SQL interview questions. It’s crucial to prepare for your interviews by practicing similar data analyst interview questions in advance. For context, you may be asked questions related to:

SQL Skills a Data Analyst Must Know
  • Using WHERE vs HAVING()
  • Performing JOINS and UNIONS
  • Finding Unique Data Values
  • Aggregation Functions
  • Ordering Results
  • And Other General SQL Skills

As such, we’re going to cover three data analyst SQL interview questions which clearly display these concepts, so you can internalize and better implement them.

Three SQL Interview Questions a Data Analyst Should Be Able to Solve

Three SQL Interview Questions a Data Analyst Should Be Able to Solve

Data Analyst SQL Interview Question #1: Filtering Data Using WHERE or HAVING()

The data analyst interview question we’re going to look at to understand why to use WHERE or HAVING() comes from a Zillow interview.


Table: zillow_transactions

Link to this data analyst SQL interview question: https://platform.stratascratch.com/coding/10315-cities-with-the-most-expensive-homes

It asks to find all cities with higher average home prices than the national average home price. For this problem, we only have one table zillow_transactions to examine.

Table: zillow_transactions
idstatecitystreet_addressmkt_price
1NYNew York City66 Trout Drive449761
2NYNew York CityAtwater277527
3NYNew York City58 Gates Street268394
4NYNew York CityNorcross279929
5NYNew York City337 Shore Ave.151592

We’ll break this down into two steps - first let’s find the average home price by city. For this, we’ll use one of the SQL Aggregate Functions - "avg()", and GROUP BY cities.

SELECT city,
       avg(mkt_price)
FROM zillow_transactions
GROUP BY city

The next logical step is to limit this data to only the cities with average home prices larger than the national average. It’s at this point we must understand the distinction between the WHERE and HAVING() clause. WHERE applies before the calculation of city averages in our query above, so, if we compared our city averages to national averages in a WHERE clause, it’s removing individual home data before the city average calculation.

Whereas, if we use a HAVING() clause, we filter data after our city average calculation and can perform a proper comparison with the national average. Additionally, the HAVING() clause lets us include comparisons, subqueries, and other SQL code, so we can perform our national average calculation and comparison inside of it.

SELECT city
FROM zillow_transactions
GROUP BY city HAVING(avg(mkt_price) >
                       (SELECT avg(mkt_price)
                        FROM zillow_transactions))

All required columns and the first 5 rows of the solution are shown

city
Mountain View
San Francisco
Santa Clara

As a data analyst, you must understand the difference between the WHERE and HAVING() clauses since you’ll in many cases need to filter out data and want to avoid discarding relevant data.

Data Analyst SQL Interview Question #2: How to Aggregate Distinct Data Values

The question we’re going to look at to understand why we must use unique values comes from a Postmates interview.


Table: postmates_orders

Link to the question: https://platform.stratascratch.com/coding/2013-customer-average-orders

This question is relatively easy as it asks us to find how many customers placed an order as well as the average order amount. We only have one table postmates_orders to examine.

Table: postmates_orders
idcustomer_idcourier_idseller_idorder_timestamp_utcamountcity_id
1102224792019-03-11 23:27:00155.7347
2104224752019-04-11 04:24:00216.644
3100239792019-03-11 21:17:00168.6947
4101205792019-03-11 02:34:00210.8443
5103218712019-04-11 00:15:00212.647

Looking at the example table, we can see how we have the same customer_id ‘102’ applying to more than one order. As a result, finding how many customers placed an order isn’t as simple as counting all the rows. Rather, we must use the DISTINCT clause in our selection to count customer_ids only once.

As for finding the average order values, we can do this easily with the average aggregation function. Unlike the previous question, we don’t have to GROUP BY any column here since we want the average for the whole table (a single value).

SELECT count(DISTINCT customer_id),
       avg(amount)
FROM postmates_orders

All required columns and the first 5 rows of the solution are shown

countavg
5139.224

It’s important to understand the proper use of DISTINCT. DISTINCT is looking for unique combinations of row data for the columns you specify. If DISTINCT is within the count function with only the customer_id following it, it’s going to disregard customer_ids which appear more than once before the aggregation. If it’s outside the count function after SELECT instead, it’s going to look for unique combinations of the count and average after the aggregation instead of before, so you end up with incorrect results.

Being able to understand how the DISTINCT clause works in conjunction with aggregation functions is critical for a data analyst since you’ll often have to run calculations on data featuring duplicates or multiple rows for a single field.

Data Analyst SQL Interview Question #3: How to Join and Aggregate Data then Order Results

The last data analyst SQL interview question we’ll look at shows how we can join and aggregate data then order results and comes from an Airbnb interview question.


Tables: airbnb_apartments, airbnb_hosts

Link to this data analyst SQL interview question: https://platform.stratascratch.com/coding/10187-find-the-total-number-of-available-beds-per-hosts-nationality

This data analyst SQL interview question is asking us to return the total comments received by a user in the last 30 days while excluding users who didn’t receive comments. There are two tables to analyze here:

Table: airbnb_apartments
host_idapartment_idapartment_typen_bedsn_bedroomscountrycity
0A1Room11USANew York
0A2Room11USANew Jersey
0A3Room11USANew Jersey
1A4Apartment21USAHouston
1A5Apartment21USALas Vegas
Table: airbnb_hosts
host_idnationalitygenderage
0USAM28
1USAF29
2ChinaF31
3ChinaM24
4MaliM30

Given we have to output the nationality along with the total number of available beds, we already know we’ll need to join the tables. In this case, since there is a match in the data for all the host_ids for the two tables, we can use an INNER JOIN. OUTER JOINs are typically used when there is some dissimilar data which we don’t have in this problem. If, for example, the airbnb_hosts table was missing some host_ids, then we would use an OUTER JOIN.

SELECT n_beds,
       nationality
FROM airbnb_hosts h
INNER JOIN airbnb_apartments a ON h.host_id = a.host_id

The INNER JOIN is extremely simple - only requiring us to provide aliases for the two tables and then join them ON their matching field host_id. The final step in this data analyst SQL interview question is to aggregate the number of beds and order them. We’ll use the SUM() aggregation function and GROUP BY nationality to sum by nationality. Finally, we’ll ORDER BY a descending count of the sum of beds.

SELECT nationality,
       SUM(n_beds) AS total_beds_available
FROM airbnb_hosts h
INNER JOIN airbnb_apartments a ON h.host_id = a.host_id
GROUP BY nationality
ORDER BY total_beds_available DESC

All required columns and the first 5 rows of the solution are shown

nationalitytotal_beds_available
Luxembourg320
USA248
Brazil72
China44
Mali28

Knowing how to JOIN tables in SQL is essential for a data analyst as you might often be working with several different datasets you need to combine before analysis. Keep in mind there are several types of SQL JOINs and which one you must use depends on how well your tables match up! Also, for best presenting and ranking your results, you’ll need to understand how to use the ORDER BY clause.

To practice more such JOIN interview questions, check out our post "SQL JOIN Interview Questions".

Conclusion

In this article, we went over several examples which highlight some of the most important SQL concepts you’ll see in Data Analyst SQL Interview Questions. Ultimately, there are almost endless SQL skills you’d need to master the language, but a good grasp of the concepts explained in this article will prepare you for a large amount of SQL interview questions you might have to solve during an interview.

Keep in mind SQL isn’t the only concept you’ll need to know for the Data Analyst interview, but it is one of the most important! On the StrataScratch platform, you can practice answering more data analyst interview questions. There you can build out your solutions independently or by using one of our articles as a guide. Beyond this, you’ll have access to a large pool of other members who post their solutions for feedback and can critique yours.

SQL Interview Questions for the Data Analyst Position
Categories


Become a data expert. Subscribe to our newsletter.