Facebook (Meta) SQL Interview Questions
We’ll closely examine one of the interesting Facebook (Meta) SQL interview questions and find a flexible approach for solving this Data Science question.
This Facebook (Meta) SQL interview question for the data scientist position can be solved in several ways, but we’ll cover the most flexible solution. Keep reading to discover an approach which can cover varying datasets and avoids the pitfalls of other strategies!
Facebook (Meta) started out in the dorms of Harvard but quickly gained prominence and is today worth over US $1 trillion. Along with Google, Apple, Microsoft, and Amazon, Facebook (Meta) forms the Big Five FAANG companies in US Information Technology. While it is mostly known for its eponymous app, Facebook (Meta) also has numerous other products and services including WhatsApp, Instagram, Facebook Messenger, Giphy, and Oculus among others.
Data Science Position at Facebook (Meta)
Given the large range of business functions, data scientist roles at Facebook (Meta) vary based on the unit and the position you are applying for. What you can be sure of is you will be working on data at an enormous scale while using cutting edge machine learning, visualization, and other related tools to help understand how people interact with each other and the world around them.
Concepts Tested in Facebook SQL Interview Questions
The main concepts tested in Facebook SQL interview questions include
- Common Table Expressions (CTEs)
- Window Functions
- Difference between rank() and dense_rank() functions
- Coalesce function
- Datetime functions
Facebook (Meta) SQL Interview Question for the Data Scientist Position
Highest Energy Consumption
The question we are going to examine in detail in this article has been asked during an interview at Facebook (Meta). It’s titled “Highest Energy Consumption”, and the key challenge is summing the energy consumption by date for a varying set of days across three datacenters and filtering for the highest sum of energy.
Interview Question Date: March 2020
Find the date with the highest total energy consumption from the Meta/Facebook data centers. Output the date along with the total energy consumption across all data centers.
Link to the question: https://platform.stratascratch.com/coding/10064-highest-energy-consumption
We’re being asked to find the date with the highest total energy consumption across three data centers. Furthermore, although not specified in the original question, if there is a tie between dates, you should include all dates with the highest energy consumption.
This Facebook SQL interview question may seem like a short and simple question, but, as we will see, the answer isn’t so straightforward and requires using a few more advanced SQL functions. While there exists multiple ways to solve this Facebook SQL interview question, we’re going to look at one of the most flexible solutions.
Framework to Solve this Facebook SQL Interview Question
To make the process of solving this Facebook SQL interview question easier, we will follow an easy framework that can be applied to any data science problem. It only has three steps, but it creates a logical pipeline for approaching problems concerning writing code for manipulating data. Here are the three steps:
- Understand your data:
- List your assumptions about the data columns, so that you know which columns you should use
- If you still don’t have a confident understanding of your data, view the first couple of rows of your data (single or multiple tables). Or ask for some example values, so you understand the actual data, not just the columns. It will not only help you identify the edge cases but also limit your solution to the bounds of your assumption.
- Formulate your approach:
- Now, start writing down the logical steps that you have to program/code.
- You also need to identify the main functions that you have to use/implement to perform the logic.
- Don't forget that interviewers will be watching you. They can intervene when needed. So, make sure you ask them to clarify any ambiguity. Your interviewers will also specify if you can use the ready-made functions, or if you should write the code from scratch.
- Code Execution:
- Build up your code in such a way that it doesn't look oversimplified or overcomplicated either.
- Build it in steps based on the steps we have outlined with the interviewer. That means the code is not going to be efficient. That’s fine. You can talk about optimization at the end of your interview.
- Here's the most important point. Try to ignore overcomplicating your code with multiple logical statements and rules in each block of code. You can define a block of code as a CTE or subquery. The reason you can do this is that it’s self-contained and separate from the rest of the code.
- Don't be quiet while laying down your code. Talk about your code as the interviewer always evaluates your problem-solving skills.
Understand your data
Let’s start by examining the data. At a Facebook data scientist interview, you usually won’t be given any actual records. Instead, you’ll see what tables or data frames are there and what are the columns and data types in these tables.
In the case of this Facebook SQL interview question, there are three tables representing 3 data centers on different continents with 2 columns each for the date and energy consumption. Each row corresponds to a single date and its respective energy consumption for the data center.
What’s important to realize is to solve this Facebook (Meta) data scientist interview question, we’ll need to collect all of our data into a single table. Reviewing the data shows us, not only are there gaps in the dates in the tables, but the tables also have a different set of dates. For example, the 5th and 6th are in the Asia table but not the EU table. As a result, we’ll need to consider a solution which aggregates all three tables without standardizing to any specific table. While using a JOIN might be the first reaction here, the variation in dates would result in missing data for any dates not present on the table we base our join on. To overcome this, we use a ‘UNION ALL’ instead.
The next step, according to the general framework for solving data science questions, is to outline a few general steps we’ll need to perform to answer this Facebook SQL interview question. These are very high-level but writing them down, in the beginning, will make the process of writing the code much easier for us.
Here are the general steps to follow:
- Start with a UNION ALL to combine the data into a single table featuring all dates and energy consumption values.
- To find the total energy consumption, we query over our new table and sum the energy consumption grouped by date.
- Use the rank window function to rank the total energy consumption on a given date in descending order.
- Query over our ranked table to limit our table to the highest ranked date.
To write the SQL code for this Facebook interview question, let’s follow the general steps that we’ve just defined and translate them into code. The key part of this approach is that we sum and rank total energy consumption across a disparate set of dates. You can think about it as combining all the data tables, summing total energy consumption for the dates in this new table, ranking the total energy consumption in descending order, and finally choosing the highest ranked data.
Looking at the first step, we can start by writing the code for joining the 3 tables. The goal is to create a table with all the data even in the case of duplicate dates, so we can start like this:
SELECT * FROM fb_eu_energy UNION ALL SElECT * FROM fb_asia_energy UNION ALL SELECT * FROM fb_na_energy
This code produces a single table combining the data from all three original tables. One thing to note is several rows have the same date, so we did not eliminate duplicates from the date column. If we had used only UNION instead of UNION ALL, we wouldn’t keep the duplicate dates and would be missing relevant energy consumption data for our sum calculation we do next.
Since we need to know the total energy consumption for any given date, the next step is to transform our code into a subquery to sum the energy consumption on each date. In order to account for duplicate dates stemming from different datacenters, we’ll need to group by date. We call this sum total_energy.
SELECT date, sum(consumption) AS total_energy FROM (SELECT * FROM fb_eu_energy UNION ALL SElECT * FROM fb_asia_energy UNION ALL SELECT * FROM fb_na_energy) fb_energy GROUP BY date
As you can see, the results are much simpler. We no longer have duplicate dates and now know total energy consumption for all given dates individually. What we need now is to find the date with the highest energy consumption. A quick look at the table shows two dates tied for the highest energy consumption. This prevents us from performing a simple limit query. While limit 2 would be correct with the sample data, it isn’t generic enough to handle other datasets. Instead, we’ll need to produce a more flexible solution.
For the third step, we’ll use the rank window function to rank the total energy consumption on a given date in descending order. Explaining window functions is out of the scope of this article, but feel free to refer to this post for a comprehensive overview: The Ultimate Guide to SQL Window Functions
Here’s what the code looks like with the rank window function:
SELECT date, sum(consumption) AS total_energy, RANK() OVER (ORDER BY sum(consumption) DESC) AS r FROM (SELECT * FROM fb_eu_energy UNION ALL SElECT * FROM fb_asia_energy UNION ALL SELECT * FROM fb_na_energy) fb_energy GROUP BY date
Now we get the dates with the highest total energy consumption at the top of our list with corresponding ranks. Note we have two dates ranked 1 due to a tie. We could use DENSE_RANK() to avoid repeated ranks and instead increment the rank for the tied data, but this makes our solution less flexible when we go to limit our data to the highest days of energy consumption.
To reduce our data down to the first rank, we’re going to put all our existing code into a subquery and filter out for rank 1. This can be achieved like this:
SELECT date, total_energy FROM (SELECT date, sum(consumption) AS total_energy, RANK() OVER (ORDER BY sum(consumption) DESC) AS r FROM (SELECT * FROM fb_eu_energy UNION ALL SElECT * FROM fb_asia_energy UNION ALL SELECT * FROM fb_na_energy) fb_energy GROUP BY date) fb_energy_ranked WHERE r = 1
This new code acts as a filter where we present only the dates and total energy consumption for our top rank. What’s convenient about this rank filter is we can change the rank value to a higher number if we’re asked to present more dates. Now, we have the entire solution, and it’s flexible enough to accommodate inconsistent date ranges, gaps in dates, and ties for the highest total energy consumption.
In this article, we have discovered a very flexible way for solving one of the Facebook SQL interview questions for the data scientist position. Remember the method mentioned here is not the only possibility, and there exist countless other ways, be they more or less efficient, for answering this Facebook SQL interview question! At the same time, window functions are a newer and less known but very powerful technology. Therefore, showing you understand them will look really good during your interview.
On our platform, you can practice answering more SQL interview questions by constructing solutions to them but always try to think of other ways to solve them, maybe you’ll come up with a more efficient and more elaborate approach.