Get data science interview questions delivered to your inbox
Please enter a valid email address.

SQL Scenario Based Interview Questions and Answers

SQL scenario based interview questions to learn how to wrangle date time fields for Facebook data science interviews.

Date Time functions are a favorite of the interviewers. Timestamped data is ubiquitous. Bank Transactions, Food Orders, Trips Data or User Weblogs – you see time series data everywhere. Given that nature of the data - each timestamp contains fields of interest like minute, day of the week, month, etc, one can create complex problems with simple datasets.

In this article, we look at the various date time types in SQL. We will then solve SQL scenario based interview questions from recent Facebook interviews and illustrate how to use the various datetime manipulation functions in SQL. We will also look at scenarios where you can use them and when not to.

SQL Datetime concepts and functions frequently tested in Facebook SQL Scenario Based Interview Questions

Some of the most tested SQL datetime function groups in Facebook SQL scenario based interview questions include.

  • Understanding the difference between different datetime variable types in SQL
  • Manipulation of interval, datetime and timestamp columns
  • Subsetting by date and time ranges
  • Extracting part of the datetime fields

The most common functions used for these data wrangling tasks are

  • TIMESTAMP to convert a text (or string) to datetime type
  • TO_CHAR() to convert datetime field to text
  • EXTRACT() to get a part of the timestamp and date time columns

In these examples we will also look at the other SQL concepts like

  • Window functions
  • Common Table Expressions (CTEs)
  • Joins on multiple tables
  • Subqueries
  • Data type conversion using CAST statement

The StrataScratch platform has many more such SQL interview questions that you can practice and ace your next interview. In this article, we look at these techniques in detail.

Date Time Variable Types

Most SQL implementations (SQLite being a notable exception) support different Date Time types. The common types available are:

date: contains only the date, not time of the day. For example: 2021-08-15

time: contains only the time of the day, not the date. By default, it is without a time zone identifier.

time with time zone: adds the additional time zone information for the time data type.

timestamp: Contains the full information date and time. By default, it is without a time zone identifier.

timestamp with time zone: adds the time zone information to for the timestamp data type.

interval: This is the difference between two date time type fields.

We need to be cognizant of the type of information contained in the field and the information requested in the output. Depending on the type of the variable, we might have access to different SQL date time functions specific to those date time fields. Example, you may not get the desired output from fields that have only dates if you want an hour by hour precision.

Let us use these datetime SQL functions on Facebook SQL scenario based interview questions. You can practice these and many such questions on the StrataScratch Platform. Let's warm up with an easy one.

SQL Scenario Based Interview Questions

SQL Scenario Based Interview Question #1: Number of Comments Per User in Past 30 days

Number of Comments Per User in Past 30 days

Return the total number of comments received for each user in the last 30 days. Assume today is 2020-02-10.

You can solve this SQL scenario based interview question here: https://platform.stratascratch.com/coding/2004-number-of-comments-per-user-in-past-30-days?python=

Below is the video if you want to check the video solution for this question:

Dataset

fb_comments_count

user_idint
created_atdatetime
number_of_commentsint

Data View

This is a relatively straightforward dataset.

user_id: is the identifier of the Facebook account.

created_at: is the comment date

number_of_comments: represents the number of comments by the user on that day

Logic

To solve this, we can do the following.

  1. The current date is given (2020-02-10). Therefore, we take the difference of created_at field with the current date.
  2. Take only the comments that fall in the 30-day window.
  3. Aggregate the number of comments by user_id.

Solution:

Let’s code our logic.

1. We start off by taking the difference from the created_at field with the reference date (2020-02-10). We can use the TIMESTAMP function to convert a string to a timestamp and directly perform mathematical calculations.

select *, timestamp '2020-02-10' - created_at as date_diff 
from fb_comments_count;

The date_diff field is an interval with the default precision (In this case seconds)

2. We can now subset this difference using the INTERVAL function to get the desired subset of comments. The INTERVAL function converts a text into an interval data type. This will help us compare the values as if they were numbers.

select *, timestamp '2020-02-10' - created_at as date_diff 
from fb_comments_count
where timestamp '2020-02-10' - created_at  
between interval '0 days' and interval '30 days';

3. Now we can aggregate the values by user_id and complete the query.

select user_id, sum(number_of_comments) as number_of_comments
from fb_comments_count
where timestamp '2020-02-10' - created_at  between interval '0 days' 
and interval '30 days'
group by user_id;

Let us crank things up a bit with the next one.

SQL Scenario Based Interview Question #2: Comments Distribution

Comments Distribution

Write a query to calculate the distribution of comments by the count of users that joined Facebook between 2018 and 2020, for the month of January 2020.

The output should contain a count of comments and the corresponding number of users that made that number of comments in Jan-2020. For example, you'll be counting how many users made 1 comment, 2 comments, 3 comments, 4 comments, etc in Jan-2020. Your left column in the output will be the number of comments while your right column in the output will be the number of users. Sort the output from the least number of comments to highest.

To add some complexity, there might be a bug where a user post is dated before the user join date. You'll want to remove these posts from the result.

You can solve this SQL scenario based interview question here: https://platform.stratascratch.com/coding/10297-comments-distribution?python=​

Dataset

This problem uses two datasets

fb_comments

user_idint
bodyvarchar
created_atdatetime

fb_users

idint
namevarchar
joined_atdatetime
city_idint
deviceint

Dataset View

fb_comments

fb_users

The relevant fields for our problem are

  • user_id and created_at fields from the fb_comments table. These will be used to identify the user and the timestamp of her comment.
  • id and joined_at fields from the fb_users table. These will be used to identify the joining date for a user.

Now that we have a handle of the data, let us try to build the logic to solve the SQL scenario based interview question.

Logic

  1. The two data sets must be merged. The join keys are user_id in the fb_comments table and id in the fb_users table.
  2. Subset the merged tables
    1. Keep only the users whose join dates are from 2018 to 2020
    2. Filter comments for Jan 2020
    3. Remove the users who posted before they joined
  3. Summarize comments by user_id and then the users by number of comments

Solution

This is one of the hard level SQL scenario based interview questions and slightly longer than the previous one. We will use CTEs to reuse the queries whenever possible.

1. Let us start off by merging the two tables. We will keep only the relevant fields. user_id and created_at fields from the fb_comments table and id and joined_at fields from the fb_users table. We use INNER JOIN to ensure we take users with both comments as well as joining dates available.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at
from fb_users a inner join fb_comments b
on a.id = b.user_id
)
select * from user_comments;

2. We now subset the dataset to keep only the users and comment dates that satisfy the problem criteria

a) Keep only the users whose join dates are from 2018 to 2020 – To accomplish this we could use the TIMESTAMP function as earlier. Or we can use the EXTRACT function. The EXTRACT function is used to get only a part of a datetime field like hours, seconds, time zone, etc. Here we will take only the year since we want the users who joined between 2018 and 2020.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
)
select * from user_comments;

b) Filter comments for Jan 2020: To do this we can use the TIMESTAMP function as earlier or EXTRACT year and month separately. Alternatively, we can also use the TO_CHAR() function. The TO_CHAR() function converts a numeric value like integer or floating point values or a quasi-numeric field like datetime fields to a character expression. Here we extract the Month and Year from the comment date (created_at field) and subset comments for January 2020.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at, to_char(b.created_at, 
'MONYYYY') as month_yr
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
AND to_char(b.created_at, 'MONYYYY') = 'JAN2020'
) 
select * from user_comments;


c) Remove the users who posted before they joined. We can add this by simply keeping only the records where the comment date is after the date that the user joined.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at, to_char(b.created_at, 
'MONYYYY') as month_yr
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
AND to_char(b.created_at, 'MONYYYY') = 'JAN2020'
AND a.joined_at <= b.created_at
)
select * from user_comments;

3. Now that we have all the relevant records, let us summarize. We need to get the number of users for each number of comments. How many users made 1 comment, 2 comments, 3 comments, 4 comments, etc.

a) We first summarize the number of comments for each user

with user_comments as
(
select a.joined_at, b.user_id, b.created_at, to_char(b.created_at, 
'MONYYYY') as month_yr
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
AND to_char(b.created_at, 'MONYYYY') = 'JAN2020'
AND a.joined_at <= b.created_at
),
summ_comments as 
(
select user_id, count(*) as num_comments 
from user_comments
group by user_id
)
select * from summ_comments;

b) Now we summarize this output counting the number of users on the number of comments. We arrange this in the ascending order of the number of comments and we have the final output.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at, to_char(b.created_at, 
'MONYYYY') as month_yr
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
AND to_char(b.created_at, 'MONYYYY') = 'JAN2020'
AND a.joined_at <= b.created_at
),
summ_comments as 
(
select user_id, count(*) as num_comments 
from user_comments
group by user_id
)
select num_comments, count(*) as num_users
from summ_comments
group by num_comments
order by num_comments

In this case we treated the datetime fields both as numeric values as well as a text string using SQL Datetime functions. To finish things off, let us try a difficult level problem involving datetime functions.

Find more advanced level Facebook data scientist interview questions here.

SQL Scenario Based Interview Question #3: Time Between Two Events

Time Between Two Events

Facebook's web logs capture every action from users starting from page loading to page scrolling. Find the user with the least amount of time between a page load and their first scroll down. Your output should include the user id, page load time, first scroll down time, and time between the two events in seconds.

You can solve this SQL scenario based interview question here: https://platform.stratascratch.com/coding/9784-time-between-two-events?python=

Dataset

facebook_web_log

user_idint
timestampdatetime
actionvarchar

Dataset View

The dataset is relatively straightforward. We have a transaction record for each user with the user action and the timestamp. We need all three fields for solving the problem. Let us build the logic first.

Logic

  1. We need the earliest timestamp for page load time and scroll down time.
  2. Take the difference of these two timestamps in seconds
  3. Output the user with the smallest difference.

Solution

This is one of the hard level SQL scenario based interview questions. While the dataset is simple, the query has multiple layers to it. Let us implement this in parts. We will again use CTEs so that we can reuse the queries.

1. Get the earliest page load time and scroll down time for each user. We can remove the other actions as they are not relevant to the query.

with first_actions as (
select user_id, action, min(timestamp) as earliest_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
group by user_id, action
order by user_id, action
)
select * from first_actions;

2. We have two values per user. We can now do a sub query by action and merge by user_id or split the CTE into two CTEs and then merge. Enter the CASE statement.

a) Since we need to take the difference between the timestamps, instead of taking individual minima, we can use the CASE statement to conditionally create two timestamp variables

with first_actions as (
select user_id, action, 
(CASE 
    WHEN action = 'page_load' THEN timestamp 
    ELSE NULL 
    END) as page_load_ts,
(CASE 
    WHEN action = 'scroll_down' THEN timestamp 
    ELSE NULL END) as scroll_down_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
)
select * from first_actions;

b) We can now take the minimum of the two timestamp fields in the same query with the MIN() function and group by user_id. SQL will treat the quasi-numeric datetime fields as if they were numbers.

with first_actions as (
select user_id,
MIN(CASE 
    WHEN action = 'page_load' THEN timestamp 
    ELSE NULL 
    END) as page_load_ts,
MIN(CASE 
    WHEN action = 'scroll_down' THEN timestamp 
    ELSE NULL END) as scroll_down_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
group by user_id
order by user_id
)
select * from first_actions;

3. We can now take the difference between the two timestamps. Since we need the difference in seconds, we need to convert the difference to a time type output rather than an interval type. We can use the CAST function to do it.

with first_actions as (
select user_id,
MIN(CASE 
    WHEN action = 'page_load' THEN timestamp 
    ELSE NULL 
    END) as page_load_ts,
MIN(CASE 
    WHEN action = 'scroll_down' THEN timestamp 
    ELSE NULL END) as scroll_down_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
group by user_id
order by user_id
)
select * from first_actions;

4. We can now take the lowest value by sorting and limiting the output

with first_actions as (
select user_id,
MIN(CASE 
    WHEN action = 'page_load' THEN timestamp 
    ELSE NULL 
    END) as page_load_ts,
MIN(CASE 
    WHEN action = 'scroll_down' THEN timestamp 
    ELSE NULL END) as scroll_down_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
group by user_id
order by user_id
)
select *, CAST(scroll_down_ts - page_load_ts AS time) as time_diff
from first_actions
order by time_diff
limit 1;

Conclusion

While solving these SQL scenario based interview questions we looked at the different ways in which you can manipulate the datetime fields in SQL. We saw the difference between the different types and when to use each type. We also converted a datetime field to a text and vice versa using the TO_CHAR() and TIMESTAMP functions. We also worked on problems that used only a part of the timestamp.

Datetime manipulation is a favorite of interviewers because timestamped data is omnipresent. Using specific datetime functions can help you save a lot of time that would be otherwise wasted in multiple subqueries. You too can master SQL Date time functions like these and many other SQL functions on StrataScratch. We have a community of over 20,000 like-minded data science enthusiasts. Join today and give yourself the best chance of acing Data Science Interviews at FAANG and other top companies.