Google SQL Interview Questions

Published: April 22, 2022

Categories:

Written by:
Irakli Tchigladze

Learn how to better formulate your approach to all Google SQL interview questions.

Google is a technology giant that makes digital products we use every day. On the surface, these tools are easy to use, but under the hood, they rely on processing massive amounts of data. Performing efficient data analysis is at the heart of Google’s business and vital for its success.

As you might expect, it’s challenging to interview for a data scientist job at Google. The process is designed to test your knowledge of SQL, as well as your ability to solve difficult problems on the spot.

In this article, we’ll walk you through one of the SQL interview questions asked during data science interviews at Google.

Basic to Intermediate Concepts Tested in Google SQL Interviews

When choosing candidates, Google interviewers pay attention to the quality of their solution. They will prioritize candidates whose SQL code is efficient and easy to understand.

To write good SQL code, you need to understand the building blocks: statements, clauses and keywords in SQL.

By learning SQL concepts, you minimize the chance of getting caught off-guard during an interview. Refresh your knowledge of these concepts to maximize your chances of getting a job.

WHERE/IN

You’re unlikely to find an SQL developer who doesn’t know what the WHERE statement is and how it's used. However, it’s recommended to have more than basic knowledge of the WHERE statement.

A thorough understanding of this concept can help you stand out from the crowd. For example, when solving a question, it may be useful to understand the order of execution in SQL and how it relates to the WHERE statement.

The WHERE statement is useless without a condition, so it’s recommended to also know how to set up a condition.This includes the knowledge of comparison and logical operators to chain multiple conditions together if needed. It’s essential to understand how to compare numbers, texts, dates and other types of values.

It’s also useful to know how to use the IN operator to replace complex logical operators and improve the readability of the code.

A candidate should be prepared to use WHERE/IN statements with number values, text, date and more. They should also know all the associated syntax rules, such as to put text values in single quotes and separate them by a comma.

In some cases, you’re going to need to use the opposite - NOT IN statement.

Aggregate functions

A deep knowledge of aggregate functions will help you find easy solutions to many different questions in SQL.

Beginner data scientists should know five essential aggregate functions: COUNT(), SUM(), AVG(), MIN(), and MAX(). This includes the knowledge of input values, associated syntax and expected output of each function.

In the question described below, we are going to use the COUNT() aggregate function to get the number of users for each business. We will also have to use the GROUP BY statement to summarize the result of the aggregate function.

Practice makes perfect, so practice writing solutions with aggregate functions for other questions here → The Ultimate Guide to SQL Aggregate Functions.

After some practice, you’ll know which aggregate functions to apply while reading the question.

HAVING

When dealing with aggregate functions, the HAVING clause is essential to write SQL queries that are efficient and easy to understand. Companies like Google pay attention to these qualities of your code.

Before going into an interview, you should know what HAVING clause is and its role for writing efficient SQL queries. Obviously the knowledge of the syntax is also essential.

Like the WHERE statement, HAVING clause also works with a condition, so you should have a good working knowledge of how to write conditions with or without logical operators (AND, OR). The post “Best Practices to Write SQL Queries: How To Structure Your Code” can help you understand the difference between HAVING and WHERE.

You should also practice how to use HAVING with COUNT(), MAX(), MIN() and other aggregate functions.

There are alternatives to using the HAVING clause. However, for most questions, HAVING is more readable and easier to write.

DISTINCT

This is a simple statement that can often make the difference between the right and wrong. It can also be useful for removing duplicate rows to familiarize yourself with the data.

DISTINCT is useful for finding solutions to SQL interview questions where you only want to keep unique values in specified columns. For instance, you could use it with a user_id value to make sure that each user is only counted once. This could be useful for counting unique users.

Candidates can also benefit from knowing advanced aspects of the DISTINCT statement. For instance, the knowledge of how it works when you specify two columns, and the syntax for using DISTINCT this way.

DISTINCT is commonly used with aggregate functions, such as COUNT(). It’s useful to know how the two features work together, as well as associated syntax rules.

Google SQL Interview Question Walkthrough

Finding User Purchases

The question is marked as ‘Medium’ difficulty on StrataScratch. To solve it, you need to filter users, keep the ones that meet the condition, and get the number of qualified users for each company. Sounds simple enough.

English, German, French, Spanish Speakers

Google

MediumID 9668

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find company IDs with more than 2 unique users who speak any of the following languages: English, German, French, or Spanish.

Table: playbook_users

Link to this Google SQL interview question: https://platform.stratascratch.com/coding/9668-english-german-french-spanish-speakers

With a good understanding of concepts mentioned above, this question is fairly easy to solve. There are multiple ways to get the answer. All of them are valid, but as a candidate, you should aim for efficiency and readability of your SQL code.

The main challenge is to get the number of qualified users for each company. Then you can easily output the company IDs where that number is more than ten.

NOTE: In the video, Frederik solves the SQL question for 10 users. At the moment, the question on the StrataScratch platform specifies two users. The only difference between solutions is the minimum limit of language speakers. Otherwise, the solution described here is identical with that of Frederik.

Available Dataset

playbook_users

user_id	int
created_at	datetime
company_id	int
language	varchar
activated_at	datetime
state	varchar

Data Assumptions

When trying to wrap your head around a SQL problem, first take a look at available data. It will give you some clarity about the question and help you formulate your approach.

This Google SQL interview question has only one table, so there’s not much to analyze. Still, let’s take a look at each column and see which ones are the most important for finding an answer:

user_id - We’ll use values from this column to get the number of unique users who speak one of the 4 languages.
created_at - We don’t need to keep track of when the records were created, so we can safely ignore this column.
company_id - The question is asking us to output the company IDs if they have 2 or more users who speak one of the four languages. We’ll have to filter the records and output values in this column.
language - We need to check the value of this column to determine whether the user speaks one of the four languages.
activated_at - We don’t need to keep track of when each user was activated, so we can safely ignore this column.
state - The question does not mention anything about user state.

Taking a look at data can also help you better understand the question itself. For example, this question could be asking us to find companies with at least 10 users who speak all four languages.

If we take a look at the data, it becomes clear that the language column includes only one language. So we can safely assume that the question is asking us to find users who speak one of the four languages.

By looking at actual data, you’ll see rows that you need to filter out - users who speak languages other than English, German, French and Spanish.

Missing data

Solution Logic

Before you start to think of an answer, read the question multiple times to wrap your head around it. If something still remains unclear, don’t hesitate to ask. It’s better than to go down the wrong road with your answer.

When formulating an approach, it’s a good idea to create a guideline for yourself. Write down broad steps that need to be taken to answer the question. It’s going to help you stay on track, and could be an opportunity to show off your analytic thinking during a data science interview.

For this specific question, you have three important steps:

filter the records to only keep the users who speak one of the four languages
get the number of qualified users for each company
return the company ID where the number of users is more than 2.

To find the answer to this Google SQL interview question, we need to filter the table multiple times. It’s important to do so in accordance with SQL order of executions. To filter the users based on the value in the language column, we use the WHERE statement, which comes before HAVING or COUNT in the order of execution.

WHERE is followed by aggregate functions and GROUP BY statements, which means we perform the COUNT() function on the list of filtered users. The HAVING clause is used to filter the result of aggregate functions. In the order of executions it comes after them. Considering all this, our query works as it should.

To filter the records, you have to set the condition. You can use the combination of WHERE/IN statements to return the records where the language column matches one of the four languages in the list.

There are other ways to set the condition to filter the records. For example, you can use the equality operator and OR logical connector to chain multiple conditions together.

When coming up with a solution, you should aim for the readability and conciseness of your SQL code. For this reason, using WHERE/IN statements is a better decision.

Once we have the filtered list of users, the next step is to count the number of unique users for each company. We’re going to need the COUNT() aggregate function to find the number of users and the GROUP BY statement to summarize them for each company.

We need to put a DISTINCT statement within the aggregate function to make sure we count only unique users.

Lastly, we need to filter the results of an aggregate function. If you’ve studied SQL, you’ll know that HAVING clause serves that purpose perfectly. It needs to be accompanied by a condition to output the companies that have more than 10 users.

Solution Approach

Step 1: Filter the records

The table has rows where the language column contains values other than the four listed languages. First, we need to filter the records so the users who speak these four languages are the only ones left.

SELECT *
FROM playbook_users
WHERE language IN ('english',
                   'german',
                   'french',
                   'spanish')

After running this query, we’ll only have records where the language column is equal to one of the items from the list.

Step 2: Get the number of filtered users for each company ID

We can use the COUNT() aggregate function to get the overall number of language speakers. The GROUP BY statement is necessary to summarize the results and get the number of users for each company. Note that we need to use the DISTINCT statement to make sure we are counting unique users.

SELECT company_id,
       COUNT(DISTINCT user_id)
FROM playbook_users
WHERE language IN ('english',
                   'german',
                   'french',
                   'spanish')
GROUP BY company_id

We’re getting closer to the final answer. The output table contains two columns: company_id, and the number of users, who speak one of the four languages at each company. We don’t need any other fields from the row.

Step 3: Output the companies with more than 2 speakers

Lastly, we use a HAVING statement to filter the results of an aggregate function. We just have to specify the condition. To set up a condition, we move the COUNT from the SELECT statement, set the condition to return the records where the result of COUNT is more than 2.

We write the SELECT statement so that when the condition is satisfied, SQL will return the company IDs.

SELECT company_id
FROM playbook_users
WHERE language IN ('english',
                   'german',
                   'french',
                   'spanish')
GROUP BY company_id
HAVING COUNT(DISTINCT user_id) > 2

If you check the solution, it’s accepted. If you run the code, it outputs the IDs of companies that have more than the specified number of speakers.

Final Words

Mastering Google SQL interview questions can help you become a better data scientist. Here’s our previous post “Google Data Scientist Interview Questions” to check out more questions asked during Google interviews. It can be an opportunity to apply your SQL knowledge to actual problems.

Also, check out our “Ultimate Guide to Become a Data Scientist at Google” and “Google Data Scientist Salary” to find out all the details about becoming a data scientist and earning your salary as a data scientist at Google.