A collection of real SQL interview questions from various industries for data scientists and business analysts.
Searching for “SQL interview questions” will give you over 10 pages worth of articles that lists questions that one might get asked on a technical interview. As someone that is preparing for a data scientist or analytics role, I’ll usually skim through the articles and memorize as much as possible. But after years of interviewing for data scientist positions and various other analytical roles, I’ve realized that I’ve never been asked these questions on an interview.
Questions like :
What is RDBMS? How is it different from DBMS? What is an Index? What is the difference between primary key and unique constraints?
are probably questions for developers and data engineers -- they’re not questions for data scientists or anyone interviewing for an analytics role.
Analytical interviews are testing for not only for your ability to manipulate data through SQL queries but more importantly interpreting the output as it relates to the business question, which often relates to developing a metric with the data and giving your interpretation of how that metric impacts the product or feature. The questions a data scientist would get in an interview isn’t “What is a primary key?”, it’s “Calculate friends acceptances over time and tell me how the new feature impacted the platform”.
So to help my fellow data scientists and anyone in an analytical role, I’ve collected my version of SQL interview questions tailored specifically for analytics. I’ve collected over 500 SQL interview questions for data scientist and general analytics roles from real companies across many different industries -- most are real interview questions but some are datasets from the companies itself. Rather than list all 500 questions and have this article go on forever, I’ll show a few questions and provide a link to the remaining 500 SQL interview questions and problems.
Industries I’ve collected SQL interview questions from
I categorized my questions based on the industry because different industries and markets have different types of data and structures. The finance industry deals with bank transactions while technology companies like Google and Facebook deal with user engagement data. The data itself is different and the questions an interviewer would ask would be different. So it makes sense to separate the questions by industry and focus on the industry you are interviewing for or want a career in.
Here’s a list of the industries I’ve collected SQL questions and datasets from:
SQL interview questions and solutions
Below are the SQL interview questions and solutions by industry. I list one question and solution and then added a link at the bottom to access the rest of the questions and solutions.
The purpose is to show you the types of questions being asked and the level of SQL skills you would need to answer the questions. In most cases, the solutions require an advanced knowledge of SQL. If you want to work through the questions, I would suggest doing so on the Strata Scratch platform, where all the questions are stored, because you’ll have access to the data and get an idea of the data structure needed to work through the SQL problem.
If you’re impatient and just want access to all the questions, here’s the list of all SQL questions.
Technology companies like Airbnb, Google, Facebook have very detailed user data. I’ve almost always have gotten asked to write queries that explain user behavior. Here’s a question and solution, followed by a link to the rest of the SQL interview questions from the tech industry.
Find the search for each city which has the highest number of amenities. Estimate the number of amenities as the number of characters in the `amenities` column.
2. Online Education
Online education or EdTech is actually curated content from LeetCode and HackerRank. These two platforms provide a lot of valuable information for data scientists as they prepare for their interviews.
Suppose that you have two tables( Customers and Orders). Write a SQL query to find all customers who never order anything.
I’ve sourced through a few public healthcare datasets to provide SQL questions that have been asked at various healthcare and healthtech companies.
What is the variance of scores which have grade A? The formula is avg((X_i - mean_x) ^ 2). What does this tell you about the normality assumption of scores for grade A?
4. Sports Analytics
If you’re interested in working at ESPN or the Athletic, you might find this section valuable. These SQL practice problems often relate to athlete metrics.
Use the classification from previous question and count the number of athletes which participated in European cities using a subquery.
Business datasets and interview questions here relate to macro-level trends of companies. Depending on what company you are interviewing at, the data can relate to financial transactions or company performance.
Which is the most popular sector from the list?
6. Travel & Leisure
Interview questions in Travel & Leisure often relate to aggregations and rankings of their datasets. Unless the company itself owns their platform where they can collect user engagement data, you might be writing SQL queries related to ranking the best hotels or wines given a dataset of a few thousand lines.
Find all provinces which produced more wines in `winemag_p1` than they did in `winemag_p2`.
The rest of the SQL interview questions from Travel & Leisure can be found here.
7. Bonus #1: Open Source Public Data
If anyone is a fan of Kaggle competition, I’ve written SQL practice questions from datasets available for free, mainly through Kaggle or from Google Cloud. The datasets may be public but the SQL questions do come from real interviews from various companies.
Find all people who earned more money via bonuses then by their base pay. From these people find which first name yields the highest minimal total pay with benefits.
8. Bonus #2: General SQL Practice
If you need a refresher on SQL concepts to prepare for your interview, here’s a few questions that will ramp you up.
How many orders of all orders are shippable? An order is defined as shippable if the customer's address is known.
So there’s 8 questions and solutions listed in the article with a link to the remaining 500 SQL interview questions and solutions. They link to a platform that stores the dataset and allows you to execute your SQL query, which I think is a valuable exercise as you prepare for an interview or practice to improve your SQL skills.
It’s important to note that data scientists and analytical roles are about how you manipulate the data and interpret the output. So my advice is to get really good at writing SQL queries so that manipulating data becomes 2nd nature. Interpreting the output is more of a function of experience and good communication skills with the interviewer so you want to make sure all your brain power is dedicated to interpreting rather than stumbling to write a SQL query.