SQL Interview Questions for the Data Analyst Position

SQL Interview Questions for the Data Analyst Position
Categories


Let's go over several examples which highlight some of the most important SQL concepts you’ll see in the SQL interview questions for Data Analyst position.

The interview process for data analysts can be intimidating for those new to the position. The data analyst interviews typically cover a wide range of skills involving SQL, relational databases, statistics, probabilities, and practical problem-solving. While the skills which companies require differs from industry to industry and company to company, being proficient in SQL is typically mandatory. As such, a strong SQL ability will always position you well as a candidate for data analyst positions. One of the best ways to improve your SQL skills for interview questions is to leverage the interactive StrataScratch platform to practice real interview questions and engage with a community of other data analysts.

In this article, we’ll cover some of the most important SQL concepts by working through several SQL interview questions for data analyst position.

SQL Skills a Data Analyst Must Know

During the data analyst interview process, you’ll more than likely encounter technical problems where you will be expected to answer SQL interview questions. It’s crucial to prepare for your data analyst interviews by practicing similar questions in advance. For context, you may be asked questions related to:

SQL Skills a Data Analyst Must Know
  • Using WHERE vs HAVING()
  • Performing JOINS and UNIONS
  • Finding Unique Data Values
  • Aggregation Functions
  • Ordering Results
  • And Other General SQL Skills

As such, we’re going to cover three questions which clearly display these concepts, so you can internalize and better implement them.

Three SQL Interview Questions a Data Analyst Should Be Able to Solve

Three SQL Interview Questions a Data Analyst Should Be Able to Solve

Data Analyst SQL Interview Question #1: Filtering Data Using WHERE or HAVING()

Cities With The Most Expensive Homes

The data analyst interview question we’re going to look at to understand why to use WHERE or HAVING() comes from a Zillow interview.

SQL Interview Questions for Data Analyst from Zillow

Link to this data analyst SQL interview question: https://platform.stratascratch.com/coding/10315-cities-with-the-most-expensive-homes

It asks to find all cities with higher average home prices than the national average home price. For this problem, we only have one table zillow_transactions to examine.

zillow_transactions

idint
statevarchar
cityvarchar
street_addressvarchar
mkt_priceint

With some example data:

Dataset for Data Analyst SQL Interview Questions from Zillow

We’ll break this down into two steps - first let’s find the average home price by city. For this, we’ll use one of the SQL Aggregate Functions - "avg()", and GROUP BY cities.

SELECT city,
       avg(mkt_price)
FROM zillow_transactions
GROUP BY city

The next logical step is to limit this data to only the cities with average home prices larger than the national average. It’s at this point we must understand the distinction between the WHERE and HAVING() clause. WHERE applies before the calculation of city averages in our query above, so, if we compared our city averages to national averages in a WHERE clause, it’s removing individual home data before the city average calculation.

Whereas, if we use a HAVING() clause, we filter data after our city average calculation and can perform a proper comparison with the national average. Additionally, the HAVING() clause lets us include comparisons, subqueries, and other SQL code, so we can perform our national average calculation and comparison inside of it.

SELECT city
FROM zillow_transactions
GROUP BY city HAVING(avg(mkt_price) >
                       (SELECT avg(mkt_price)
                        FROM zillow_transactions))
Output for Data Analyst SQL Interview Questions from Zillow

As a data analyst, you must understand the difference between the WHERE and HAVING() clauses since you’ll in many cases need to filter out data and want to avoid discarding relevant data.

Data Analyst SQL Interview Question #2: How to Aggregate Distinct Data Values

Customer Average Orders

The question we’re going to look at to understand why we must use unique values comes from a Postmates interview.

SQL Interview Questions for Data Analyst from Postmates

Link to the question: https://platform.stratascratch.com/coding-question?id=2013

This question is relatively easy as it asks us to find how many customers placed an order as well as the average order amount. We only have one table postmates_orders to examine.

postmates_orders

idint64
customer_idint64
courier_idint64
seller_idint64
order_timestamp_utcdatetime64[ns]
amountfloat64
city_idint64

With example data:

Dataset for Data Analyst SQL Interview Question from Postmates

Looking at the example table, we can see how we have the same customer_id ‘102’ applying to more than one order. As a result, finding how many customers placed an order isn’t as simple as counting all the rows. Rather, we must use the DISTINCT clause in our selection to count customer_ids only once.

As for finding the average order values, we can do this easily with the average aggregation function. Unlike the previous question, we don’t have to GROUP BY any column here since we want the average for the whole table (a single value).

SELECT count(DISTINCT customer_id),
       avg(amount)
FROM postmates_orders
Output for Data Analyst SQL Interview Question from Postmates

It’s important to understand the proper use of DISTINCT. DISTINCT is looking for unique combinations of row data for the columns you specify. If DISTINCT is within the count function with only the customer_id following it, it’s going to disregard customer_ids which appear more than once before the aggregation. If it’s outside the count function after SELECT instead, it’s going to look for unique combinations of the count and average after the aggregation instead of before, so you end up with incorrect results.

Being able to understand how the DISTINCT clause works in conjunction with aggregation functions is critical for a data analyst since you’ll often have to run calculations on data featuring duplicates or multiple rows for a single field.

Data Analyst SQL Interview Question #3: How to Join and Aggregate Data then Order Results

Find the total number of available beds per hosts' nationality

The last data analyst SQL interview question we’ll look at shows how we can join and aggregate data then order results and comes from an Airbnb interview question.

SQL Interview Questions for Data Analyst from Airbnb

Link to this data analyst SQL interview question: https://platform.stratascratch.com/coding/10187-find-the-total-number-of-available-beds-per-hosts-nationality

This question is asking us to return the total comments received by a user in the last 30 days while excluding users who didn’t receive comments. There are two tables to analyze here:

airbnb_apartments

host_idint
apartment_idvarchar
apartment_typevarchar
n_bedsint
n_bedroomsint
countryvarchar
cityvarchar

Dataset for Data Analyst SQL Interview Question from Airbnb

airbnb_hosts

host_idint
nationalityvarchar
gendervarchar
ageint

Dataset for Data Analyst SQL Interview Question from Airbnb

Given we have to output the nationality along with the total number of available beds, we already know we’ll need to join the tables. In this case, since there is a match in the data for all the host_ids for the two tables, we can use an INNER JOIN. OUTER JOINs are typically used when there is some dissimilar data which we don’t have in this problem. If, for example, the airbnb_hosts table was missing some host_ids, then we would use an OUTER JOIN.

SELECT n_beds,
       nationality
FROM airbnb_hosts h
INNER JOIN airbnb_apartments a ON h.host_id = a.host_id

The INNER JOIN is extremely simple - only requiring us to provide aliases for the two tables and then join them ON their matching field host_id. The final step in this data analyst SQL interview question is to aggregate the number of beds and order them. We’ll use the SUM() aggregation function and GROUP BY nationality to sum by nationality. Finally, we’ll ORDER BY a descending count of the sum of beds.

SELECT nationality,
       SUM(n_beds) AS total_beds_available
FROM airbnb_hosts h
INNER JOIN airbnb_apartments a ON h.host_id = a.host_id
GROUP BY nationality
ORDER BY total_beds_available DESC
Output for Data Analyst SQL Interview Question from Airbnb

Knowing how to JOIN tables in SQL is essential for a data analyst as you might often be working with several different datasets you need to combine before analysis. Keep in mind there are several types of JOINs and which one you must use depends on how well your tables match up! Also, for best presenting and ranking your results, you’ll need to understand how to use the ORDER BY clause.

To practice more such JOIN interview questions, check out our post "SQL JOIN Interview Questions".

Conclusion

In this article, we went over several examples which highlight some of the most important SQL concepts you’ll see in Data Analyst SQL Interview Questions. Ultimately, there are almost endless SQL skills you’d need to master the language, but a good grasp of the concepts explained in this article will prepare you for a large amount of SQL interview questions you might have to solve during an interview.

Keep in mind SQL isn’t the only concept you’ll need to know for the Data Analyst interview, but it is one of the most important! On the StrataScratch platform, you can practice answering more data analyst interview questions like this data analyst interview question from Amazon! There you can build out your solutions independently or by using one of our articles as a guide. Beyond this, you’ll have access to a large pool of other members who post their solutions for feedback and can critique yours.

SQL Interview Questions for the Data Analyst Position
Categories


Become a data expert. Subscribe to our newsletter.