A Few In-Depth Explanations To SQL Interview Questions From Real Companies
This blog post follows our “SQL Interview Questions From Real Companies” video which can be found below
In this post we’ll go through 4 SQL questions you’re bound to encounter during a technical interview. While these problems are on the easy side, it’s still important that you bring along the interviewer. You want to show your interviewer your thought process. It’s okay if you don’t have enough time to solve the problem. Interviewers care more about how you solve problems in general than whether you can solve this specific problem. So during an interview, remember to take your time and describe each step to your interviewer.
We’ll use a three-step approach to problem-solving that you can use during your technical interviews. First, remember to build up a query step by step and explain each step to the interviewer. Your interviewer wants to see that you know what you’re doing and why you’re doing it. Second, you should be looking for edge cases throughout an interview. By asking your interviewer questions about edge cases you’ll show the interview your attention to detail. Finally, you should be able to explain to the interviewer the effect of every clause and expression.
We’ll be using Strata Scratch for our SQL exercises. Strata Scratch is a platform that helps you prepare for technical interviews. Every problem in this post is available to you on Strata Scratch.
Question 1: Find the drafts that contain the word optimism.
For our first interview question, we’re given a table called google_file_store and asked to find all the draft files containing the word optimism.
We’ll start every problem by looking at the table. I can pull keywords out of the question and use them to understand the table. Looking for ‘draft’, I see a few file names start with ‘draft’. All draft files must follow the format of the word ‘draft’ followed by a number. I also see that some of the contents contain the word ‘optimism’. Now is a good chance to ask the interviewer some questions. It’s always important that you ask the interviewer questions because helps you solve the problem and it helps the interviewer understand your thought process.
A good question would be, where in the content is the word ‘optimism’ located? Is it in the beginning, middle, or end of the string? It might seem obvious from looking at the table that the position of optimism doesn’t matter for this question, but asking questions can still benefit you. Currently, we’re making assumptions about the problem, and by asking questions we guarantee we’re solving the problem correctly and showing the interviewer our attention to detail. For this problem, the position and the case of ‘optimism’ do not matter.
Let’s start writing our query. Every interview problem starts with writing out the basic query. For these problems, we’re using the SELECT * statement. The SELECT * statement is used whenever you want to return all columns of information from a table. The next clause in the basic query is the FROM clause. This clause is used to choose which table we’re getting information from. We’ll add ‘FROM sql_interviews.google_file_store’. As expected, when you run this query, it returns all the information from the table.
To solve this problem, we need a way to filter the results. We do that using a WHERE clause. Understanding the WHERE clause is critical when going into a technical interview. Almost every question will involve understanding the context of the problem, and describing that context as a WHERE clause. The WHERE clause works by taking an expression, which is something that returns true or false. Each row is evaluated using the expression; if it’s true the row is returned, and it’s false it gets filtered out. We’ll need to write an expression that matches the two conditions of our problem so that we can filter out what we need. Each file has to be a draft and the contents must contain the word optimism.
Let’s deal with the first condition first. We need an expression that can do simple pattern matching. In this case, the ILIKE expression is perfect. This expression takes the name of a column and a pattern string, and only returns rows which match the pattern. Pattern strings have two special characters. The % character represents 0 or more of any character, and the _ character which represents exactly 1 of any character. Using those 2 characters we can use pattern matching to match many strings. We need to write a pattern that can match any string starting with ‘draft’. The pattern is ‘draft’ followed by 0 or more of any character so we’ll use ‘draft%’. This pattern string will match any string starting with 'draft'.
Now that we have a pattern string we can write our expression. After our WHERE clause, we’ll as a tab, and write “filename ILIKE ‘draft%’”. Now when we run the query it only returns drafts.
Now we can deal with the second condition. The second condition is that the contents contain the word ‘optimism’. We’ll add an AND expression. AND expressions allow us to have two conditions. They will check the expression before and after, and only return true if both expressions are true. Now we can add an expression for the second condition. This condition requires more pattern matching so we can use ILIKE again. The only difference is that optimism is located in the middle of the contents so we need a different pattern string. There can be zero or more characters before and after optimism so we use ‘%optimism%’ as our pattern string. We write ‘contents ILIKE ‘%optimism%’ after the AND expression.
Now that we’re filtering based on all conditions, we’ve solved the problem. I can run the query and I get the expected results.
Side note on the ILIKE expression. ILIKE has a sister expression called the LIKE expression. Both expressions work the same with one exception, the LIKE expression is case sensitive and the ILIKE expression is not. For this problem, the case of ‘draft’ and ‘optimism’ do not matter so we used ILIKE.
Question 2: Print all workers who are also managers
For our second question, we’re given two tables, worker and title, and asked to write a query that lists all of the managers.
As always, we’ll look at the table first. You’ll see that the worker table has all the information about each worker, but doesn’t list their job title. The title table lists the job title of each worker, but it only has a reference to the worker. For this problem, we need information from both tables so we will need a JOIN.
We’ll write our basic query first. SELECT * FROM sql_interviews.worker table, because we don’t know which columns we want to return yet.
Now I need to combine this table with the title table. To combine tables we use the JOIN clause. The JOIN clause is another clause that you need to know. Your interviewer will want to see that you have a solid understanding of how JOIN clauses work. JOIN clauses work by creating a table containing every possible pair from both tables and filtering that table with an ON clause. An ON clause is like a WHERE clause for JOINs. So to combine these tables we’ll add ‘JOIN sql_interviews.title’.
Now that we’re working with two tables we want to name each table with an AS clause so we can directly reference them. It’s common when writing queries to name a table after the first character in its original name. I’ll be naming the worker title ‘w’ and the title table ‘t’ so I can directly reference later in the query.
Now when we run the query we have a table that has the information about the worker and their title. We have to filter this table with an ON clause. If we could run this query without an ON clause, we would get a table that had every possible pair of worker and job title combined. Obviously, this isn’t what we want; we want every worker to be paired with their job title. We can get this table by adding our ON clause. ON clauses work the same as WHERE clauses so all we need is the correct expression. In this case, the worker_id of the worker should equal the worker_ref_id of the job title. By adding ‘ON w.worker_id = t.worker_ref_id’ we get the table we want. Now we have a table where each row has the information of a worker and their job title.
Finally, we need to filter the table such that it only contains the managers. We have two choices for filtering out the managers. We can expand our ON clause. The ON clause works the same as a WHERE clause so we can add an AND expression followed by our condition. Then we can add our second condition, “t.worker_title = ‘Manager’” and the resulting table will only have managers. That works, but we can also add a WHERE clause. The new table created by a JOIN clause works just like the original table. That means we can filter it with a WHERE clause. Just add the WHERE clause with our expression. I prefer adding a WHERE clause because it makes your ON clause simple and easy to understand.
To finish the problem I’ll choose which columns to SELECT. For this problem, I only want each manager’s first name and job title. Running this query returns all of the managers, and solves the problem.
Question 3: List employees with the same salary
For our third interview question, we’re given the worker table and we’re asked to write a query that lists all the workers with the same salaries.
To solve this problem we need to use a self-join.
As always, we first look at the table. Looking at the table we see that each row has all the information we need for the problem. It lists their worker id, name, and salary. We need to find some way to select all the pairs of workers who have the same salary.
I’ll start by writing a basic query. We’re going to SELECT * FROM sql_interviews.worker because I don’t know which columns I need. Now we’ll compare this table to itself.
Comparing a table to itself doesn’t require a new clause. If we look back to problem 2, this isn’t any different from comparing the worker table to the title table. We want every pair of workers that have the same salary. To get that we can JOIN this table with itself. To start the join we’ll add ‘JOIN sql_interviews.worker’ to our query. Now that we have two tables we need to name them. I choose to name my tables w1 for worker 1 and w2 for worker 2. Next, we’ll add our ON clause. The conditions of this problem are that the salary of the workers should be equal and we can describe that using an expression. I’ll add ‘ON w1.salary = w2.salary’ to the query. Running this query will give us a table containing every pair of workers with the same salary.
There is one issue. If we run this query, it returns more rows than expected. If you’re paying careful attention you’ll see the problem. Obviously, every worker shares the same salary as themselves. That means there is an additional row in the table for every time a worker is compared with themselves. We need to expand our ON clause so each pair must have different workers. I’ll add an AND expression followed by the new condition ‘w1.worker_id != w2.worker_id’. Now when we run this query we get a table that only contains pairs of different workers with the same salary.
Finally, we can finish our SELECT statement. For this problem, we’re interested in who has the same salary. We should return the columns for each worker’s name and the column for their salary. Running this query should return Amitah and Vivek, and Vivek and Amitah, which is correct. We’ve solved the problem.
Question 4: Find the first 5 entries of joined contacts and searches
For our fourth interview question, we’re given the tables airbnb_contacts and airbnb_searches and told to merge the tables on an appropriate key and display the first 5 results.
This problem can be more challenging because we aren’t told what to match on. It’ll require quickly forming an understanding of the tables and making a judgment based on that.
It should be noted that we’ve changed schemas from sql_interviews to datasets for this problem.
Let’s look at the tables. Immediately we see a problem, while these tables do have shared columns, the ds_checkin column, and ds_checkout column, those columns are not uniquely identifying. If a column isn’t uniquely identifying, then we can’t join on it alone. We need to understand the table better to solve this problem.
In a situation like this, it’s best to ask the interviewer questions so we don’t make incorrect assumptions. You’ll never be penalized for asking questions about the problem, so don’t worry. First, we can ask what each row represents in airbnb_contacts. The interviewer will say, each row in this table represents a contact between a guest and a host about a listing. It’s important to note that this table has the unique user id of the guest and the host. Then, we can ask what each row represents in airbnb_searches. They’ll say, each row represents a search performed by a user trying to find a listing to stay in. Given that information, we can start to form a solution. In this case, we know the users that perform the searches found in airbnb_searches are also the guests in airbnb_contacts. We can ask the interviewer if id_user column represents the same user as the id_guest column. They do, and that’s something we can JOIN on.
We’re going to write our basic query. Like before we’ll SELECT * FROM datasets.searches. Because we know we’re going to need two tables we should name our table now. I choose to name it ‘s’ for search.
This problem requires information from two tables like in previous. We’ll be using the JOIN clause again. This join is just like the JOINs we used in question two and three. Based on the questions we asked the interviewer, we know that id_user from the searches table represents the same user as id_guest in the contacts table. That’s what we’ll be joining on. We’ll JOIN with the datasets.airbnb_contacts table, naming it with an AS clause, and add ‘ON id_user = id_guest’. When we run this query it returns a table where each row represents a search that leads to contact. This is technically a solution, but we can do better.
To improve our query we need to ask our interviewer what questions they’re trying to answer with this data.
One way to improve our query is to choose the correct type of JOIN. There are two types of JOINs in SQL. We’ve been using an INNER JOIN which only returns matching pairs. We can also use an OUTER JOIN if we want information on rows that have no match. If the interviewer is only interested in matching searches where users contacted their host, then an INNER JOIN is appropriate for this query. If instead, the interviewer wanted to know the ratio of searches that resulted in contact, then an OUTER JOIN would be more appropriate. We would want to return all searches that don’t result in contact so we would use a LEFT JOIN in this case. That is the case so we’ll replace our JOIN clause with a LEFT JOIN clause.
Another way to improve our query is to make our ON clause more specific. The ds_checkin column and ds_checkout column are common across the two tables. It makes sense that adding them to the ON clause will create a more accurate representation of a user’s search intention. Some users will have multiple searches and contacts, and if we only want to return searches that lead to a contact, then each search should be for the same day as the contact. If we don’t check for this condition, then one contact in the table could result from multiple searches. We’ll improve the ON clause by adding the conditions that ds_checkin and ds_checkout are equal across the tables. Improving your ON clause in this way will show the interviewer that you have good table comprehension.
During an interview, it should be your goal to continuously improve your solutions. By asking the interviewer questions you can solve a problem exactly how they want. Even if your initial assumptions are correct and you don’t change your query, you’re still showing your interviewer that you’re through.
We only want 5 results for this problem, so we add a LIMIT clause. LIMIT clauses work by taking a number, and only returning that number of rows. For this problem, we want all the information so SELECT * works. Running this query gives us our valid solution.
Watch Our Youtube Video On These Four Questions
The last three of these problems show how important understanding JOIN clauses are. It can be challenging understanding what’s going on. If you want to practice writing SQL, I recommend joining Strata Scratch. You’ll have access to over 450 questions taken directly from real companies and you can use them to prepare yourself for an interview.
If you’ve made it this far, and you still haven’t seen the related video, then I highly recommend watching it now. The visuals will make understanding this material easier.