SQL Scenario Based Interview Questions and Answers

SQL Datetime concepts and functions frequently tested in Facebook SQL Scenario Based Interview Questions

Concepts tested in Facebook SQL Scenario Based Interview Questions

Some of the most tested SQL datetime function groups in Facebook SQL scenario based interview questions include.

Understanding the difference between different datetime variable types in SQL
Manipulation of interval, datetime and timestamp columns
Subsetting by date and time ranges
Extracting part of the datetime fields

The most common functions used for these data wrangling tasks are

TIMESTAMP to convert a text (or string) to datetime type
TO_CHAR() to convert datetime field to text
EXTRACT() to get a part of the timestamp and date time columns

In these examples we will also look at the other SQL concepts like

Window functions
Common Table Expressions (CTEs)
Joins on multiple tables
Subqueries
Data type conversion using CAST statement

The StrataScratch platform has many more such SQL interview questions that you can practice and ace your next interview. In this article, we look at these techniques in detail.

Date Time Variable Types

Most SQL implementations (SQLite being a notable exception) support different Date Time types. The common types available are:

Date Time Variable Types in SQL scenario based interview questions

date: contains only the date, not time of the day. For example: 2021-08-15
time: contains only the time of the day, not the date. By default, it is without a time zone identifier.
time with time zone: adds the additional time zone information for the time data type.
timestamp: Contains the full information date and time. By default, it is without a time zone identifier.
timestamp with time zone: adds the time zone information to for the timestamp data type.
interval: This is the difference between two date time type fields.

We need to be cognizant of the type of information contained in the field and the information requested in the output. Depending on the type of the variable, we might have access to different SQL date time functions specific to those date time fields. Example, you may not get the desired output from fields that have only dates if you want an hour by hour precision.

Let us use these datetime SQL functions on Facebook SQL scenario based interview questions. You can practice these and many such questions on the StrataScratch Platform. Let's warm up with an easy one.

SQL Scenario Based Interview Questions

SQL Scenario Based Interview Question #1: Number of Comments Per User in Past 30 days

Number of Comments Per User in 30 days before 2020-02-10

Interview Question Date: January 2021

Meta

EasyID 2004

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Return the total number of comments received for each user in the 30 or less days before 2020-02-10. Don't output users who haven't received any comment in the defined time period.

Table: fb_comments_count

You can solve this SQL scenario based interview question here: https://platform.stratascratch.com/coding/2004-number-of-comments-per-user-in-past-30-days

Below is the video if you want to check the video solution for this question:

Data View

Table: fb_comments_count

user_id	created_at	number_of_comments
18	2019-12-29	1
25	2019-12-21	1
78	2020-01-04	1
37	2020-02-01	1
41	2019-12-23	1

This is a relatively straightforward dataset.

user_id: is the identifier of the Facebook account.
created_at: is the comment date
number_of_comments: represents the number of comments by the user on that day

Logic

To solve this, we can do the following.

The current date is given (2020-02-10). Therefore, we take the difference of created_at field with the current date.
Take only the comments that fall in the 30-day window.
Aggregate the number of comments by user_id.

Solution:

Let’s code our logic.

1. We start off by taking the difference from the created_at field with the reference date (2020-02-10). We can use the TIMESTAMP function to convert a string to a timestamp and directly perform mathematical calculations.

select *, timestamp '2020-02-10' - created_at as date_diff 
from fb_comments_count;

The date_diff field is an interval with the default precision (In this case seconds)

2. We can now subset this difference using the INTERVAL function to get the desired subset of comments. The INTERVAL function converts a text into an interval data type. This will help us compare the values as if they were numbers.

select *, timestamp '2020-02-10' - created_at as date_diff 
from fb_comments_count
where timestamp '2020-02-10' - created_at  
between interval '0 days' and interval '30 days';

3. Now we can aggregate the values by user_id and complete the query.

select user_id, sum(number_of_comments) as number_of_comments
from fb_comments_count
where timestamp '2020-02-10' - created_at  between interval '0 days' 
and interval '30 days'
group by user_id;

Output

All required columns and the first 5 rows of the solution are shown

user_id	number_of_comments
27	1
95	2
78	1
58	2
8	4

Let us crank things up a bit with the next one.

SQL Scenario Based Interview Question #2: Comments Distribution

Comments Distribution

Interview Question Date: November 2020

Meta

HardID 10297

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Write a query to calculate the distribution of comments by the count of users that joined Meta/Facebook between 2018 and 2020, for the month of January 2020.

The output should contain a count of comments and the corresponding number of users that made that number of comments in Jan-2020. For example, you'll be counting how many users made 1 comment, 2 comments, 3 comments, 4 comments, etc in Jan-2020. Your left column in the output will be the number of comments while your right column in the output will be the number of users. Sort the output from the least number of comments to highest.

To add some complexity, there might be a bug where an user post is dated before the user join date. You'll want to remove these posts from the result.

Tables: fb_users, fb_comments

You can solve this SQL scenario based interview question here: https://platform.stratascratch.com/coding/10297-comments-distribution

Dataset View

This problem uses two datasets

fb_comments

Table: fb_comments

user_id	body	created_at
89	Wrong set challenge guess college as position.	2020-01-16
33	Interest always door health military bag. Store smile factor player goal detail TV loss.	2019-12-31
34	Physical along born key leader various. Forward box soldier join.	2020-01-08
46	Kid must energy south behind hold. Research common long state get at issue. Weight technology live plant. His size approach loss.	2019-12-29
25	Or matter will turn only woman fact.	2019-12-21

fb_users

Table: fb_users

id	name	joined_at	city_id	device
4	Ashley Sparks	2020-06-30	63	2185
8	Zachary Tucker	2018-02-18	78	3900
9	Caitlin Carpenter	2020-07-23	60	8592
18	Wanda Ramirez	2018-09-28	55	7904
21	Tonya Johnson	2019-12-02	62	4816

The relevant fields for our problem are

user_id and created_at fields from the fb_comments table. These will be used to identify the user and the timestamp of her comment.
id and joined_at fields from the fb_users table. These will be used to identify the joining date for a user.

Now that we have a handle of the data, let us try to build the logic to solve the SQL scenario based interview question.

Logic

The two data sets must be merged. The join keys are user_id in the fb_comments table and id in the fb_users table.
Subset the merged tables
1. Keep only the users whose join dates are from 2018 to 2020
2. Filter comments for Jan 2020
3. Remove the users who posted before they joined
Summarize comments by user_id and then the users by number of comments

Solution

This is one of the hard level SQL scenario based interview questions and slightly longer than the previous one. We will use SQL CTEs to reuse the queries whenever possible.

1. Let us start off by merging the two tables. We will keep only the relevant fields. user_id and created_at fields from the fb_comments table and id and joined_at fields from the fb_users table. We use INNER JOIN to ensure we take users with both comments as well as joining dates available.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at
from fb_users a inner join fb_comments b
on a.id = b.user_id
)
select * from user_comments;

2. We now subset the dataset to keep only the users and comment dates that satisfy the problem criteria

a) Keep only the users whose join dates are from 2018 to 2020 – To accomplish this we could use the TIMESTAMP function as earlier. Or we can use the EXTRACT function. The EXTRACT function is used to get only a part of a datetime field like hours, seconds, time zone, etc. Here we will take only the year since we want the users who joined between 2018 and 2020.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
)
select * from user_comments;

b) Filter comments for Jan 2020: To do this we can use the TIMESTAMP function as earlier or EXTRACT year and month separately. Alternatively, we can also use the TO_CHAR() function. The TO_CHAR() function converts a numeric value like integer or floating point values or a quasi-numeric field like datetime fields to a character expression. Here we extract the Month and Year from the comment date (created_at field) and subset comments for January 2020.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at, to_char(b.created_at, 
'MONYYYY') as month_yr
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
AND to_char(b.created_at, 'MONYYYY') = 'JAN2020'
) 
select * from user_comments;

c) Remove the users who posted before they joined. We can add this by simply keeping only the records where the comment date is after the date that the user joined.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at, to_char(b.created_at, 
'MONYYYY') as month_yr
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
AND to_char(b.created_at, 'MONYYYY') = 'JAN2020'
AND a.joined_at <= b.created_at
)
select * from user_comments;

3. Now that we have all the relevant records, let us summarize. We need to get the number of users for each number of comments. How many users made 1 comment, 2 comments, 3 comments, 4 comments, etc.

a) We first summarize the number of comments for each user

with user_comments as
(
select a.joined_at, b.user_id, b.created_at, to_char(b.created_at, 
'MONYYYY') as month_yr
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
AND to_char(b.created_at, 'MONYYYY') = 'JAN2020'
AND a.joined_at <= b.created_at
),
summ_comments as 
(
select user_id, count(*) as num_comments 
from user_comments
group by user_id
)
select * from summ_comments;

b) Now we summarize this output counting the number of users on the number of comments. We arrange this in the ascending order of the number of comments and we have the final output.

with user_comments as
(
select a.joined_at, b.user_id, b.created_at, to_char(b.created_at, 
'MONYYYY') as month_yr
from fb_users a inner join fb_comments b
on a.id = b.user_id
WHERE
EXTRACT(YEAR FROM a.joined_at) IN (2018, 2019, 2020)
AND to_char(b.created_at, 'MONYYYY') = 'JAN2020'
AND a.joined_at <= b.created_at
),
summ_comments as 
(
select user_id, count(*) as num_comments 
from user_comments
group by user_id
)
select num_comments, count(*) as num_users
from summ_comments
group by num_comments
order by num_comments

Output

All required columns and the first 5 rows of the solution are shown

comment_cnt	user_cnt
1	4
2	6
3	1
4	1
6	1

In this case we treated the datetime fields both as numeric values as well as a text string using SQL Datetime functions. To finish things off, let us try a difficult level problem involving datetime functions.

Find more advanced level Facebook data scientist interview questions here.

SQL Scenario Based Interview Question #3: Time Between Two Events

Time Between Two Events

Interview Question Date: July 2018

Meta

HardID 9784

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Software Engineer

Meta/Facebook's web logs capture every action from users starting from page loading to page scrolling. Find the user with the least amount of time between a page load and their scroll down. Your output should include the user id, page load time, scroll down time, and time between the two events in seconds.

Table: facebook_web_log

You can solve this SQL scenario based interview question here: https://platform.stratascratch.com/coding/9784-time-between-two-events

Dataset View

Table: facebook_web_log

user_id	timestamp	action
0	2019-04-25 13:30:15	page_load
0	2019-04-25 13:30:18	page_load
0	2019-04-25 13:30:40	scroll_down
0	2019-04-25 13:30:45	scroll_up
0	2019-04-25 13:31:10	scroll_down

The dataset is relatively straightforward. We have a transaction record for each user with the user action and the timestamp. We need all three fields for solving the problem. Let us build the logic first.

Logic

We need the earliest timestamp for page load time and scroll down time.
Take the difference of these two timestamps in seconds
Output the user with the smallest difference.

Solution

This is one of the hard level SQL scenario based interview questions. While the dataset is simple, the query has multiple layers to it. Let us implement this in parts. We will again use CTEs so that we can reuse the queries.

1. Get the earliest page load time and scroll down time for each user. We can remove the other actions as they are not relevant to the query.

with first_actions as (
select user_id, action, min(timestamp) as earliest_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
group by user_id, action
order by user_id, action
)
select * from first_actions;

2. We have two values per user. We can now do a sub query by action and merge by user_id or split the CTE into two CTEs and then merge. Enter the CASE statement.

a) Since we need to take the difference between the timestamps, instead of taking individual minima, we can use the CASE statement to conditionally create two timestamp variables

with first_actions as (
select user_id, action, 
(CASE 
    WHEN action = 'page_load' THEN timestamp 
    ELSE NULL 
    END) as page_load_ts,
(CASE 
    WHEN action = 'scroll_down' THEN timestamp 
    ELSE NULL END) as scroll_down_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
)
select * from first_actions;

b) We can now take the minimum of the two timestamp fields in the same query with the MIN() function and group by user_id. SQL will treat the quasi-numeric datetime fields as if they were numbers.

with first_actions as (
select user_id,
MIN(CASE 
    WHEN action = 'page_load' THEN timestamp 
    ELSE NULL 
    END) as page_load_ts,
MIN(CASE 
    WHEN action = 'scroll_down' THEN timestamp 
    ELSE NULL END) as scroll_down_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
group by user_id
order by user_id
)
select * from first_actions;

All required columns and the first 5 rows of the solution are shown

user_id	page_load_ts	scroll_down_ts
0	2019-04-25 13:30:15	2019-04-25 13:30:40
1	2019-04-25 13:40:00	2019-04-25 13:40:10
2	2019-04-25 13:41:21	2019-04-25 13:41:30

3. We can now take the difference between the two timestamps. Since we need the difference in seconds, we need to convert the difference to a time type output rather than an interval type. We can use the CAST function to do it.

with first_actions as (
select user_id,
MIN(CASE 
    WHEN action = 'page_load' THEN timestamp 
    ELSE NULL 
    END) as page_load_ts,
MIN(CASE 
    WHEN action = 'scroll_down' THEN timestamp 
    ELSE NULL END) as scroll_down_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
group by user_id
order by user_id
)
select * from first_actions;

All required columns and the first 5 rows of the solution are shown

user_id	page_load_ts	scroll_down_ts
0	2019-04-25 13:30:15	2019-04-25 13:30:40
1	2019-04-25 13:40:00	2019-04-25 13:40:10
2	2019-04-25 13:41:21	2019-04-25 13:41:30

4. We can now take the lowest value by sorting and limiting the output

with first_actions as (
select user_id,
MIN(CASE 
    WHEN action = 'page_load' THEN timestamp 
    ELSE NULL 
    END) as page_load_ts,
MIN(CASE 
    WHEN action = 'scroll_down' THEN timestamp 
    ELSE NULL END) as scroll_down_ts
from facebook_web_log
where action in ('page_load', 'scroll_down')
group by user_id
order by user_id
)
select *, CAST(scroll_down_ts - page_load_ts AS time) as time_diff
from first_actions
order by time_diff
limit 1;

Output

All required columns and the first 5 rows of the solution are shown

user_id	load_time	scroll_time	duration
2	2019-04-25 13:41:21	2019-04-25 13:41:30	00:00:09

SQL Scenario Based Interview Question #4: Users by Average Session Time

Users By Average Session Time

Interview Question Date: July 2021

Meta

MediumID 10352

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Calculate each user's average session time. A session is defined as the time difference between a page_load and page_exit. For simplicity, assume a user has only 1 session per day and if there are multiple of the same events on that day, consider only the latest page_load and earliest page_exit, with an obvious restriction that load time event should happen before exit time event . Output the user_id and their average session time.

Table: facebook_web_log

Link to the question: https://platform.stratascratch.com/coding/10352-users-by-avg-session-time

Data View:

Table: facebook_web_log

user_id	timestamp	action
0	2019-04-25 13:30:15	page_load
0	2019-04-25 13:30:18	page_load
0	2019-04-25 13:30:40	scroll_down
0	2019-04-25 13:30:45	scroll_up
0	2019-04-25 13:31:10	scroll_down

All three fields are relevant for our problem.

Logic

Find the duration of user sessions for each day.
Group rows by user_id value. Each group will have records that describe the sessions of a specific user.
Find the average duration of users’ sessions.

Solution

1. Available data shows that a user can have multiple sessions on different days. So in order to calculate the average duration for each user, first, we need to measure the session duration for each day.

Each record in the facebook_web_log table represents one user action. To calculate the duration, we need to work with two actions - page_load and page_exit simultaneously. So we need two copies of the action data.

We can INNER JOIN the table with itself on a shared dimension - user_id values.

We use the WHERE clause with three conditions to filter rows in the combined table. From the first copy of the actions table, we need to keep only ‘page_load’ actions. From the second copy, we should keep only ‘page_exit’ actions. The third condition is logical - ‘page_exit’ should happen after ‘page_load’.

At this time, we need to find the length of the session for each individual day. We should group rows by unique pairs of user_id and date values. In the SELECT statement, we cast timestamp values to generate date values.

The question specifically tells us to find the duration between the latest page load and the earliest page exit. In the SELECT statement, we use min() and max() aggregate functions to measure the duration of a user session.

We use max() to find the latest ‘page_load’ and min() to find the earliest ‘page_exit’. We need to find the difference between these two values and save it as ‘session_duration’.

Finally, we use the WITH clause so that we can reference the resulting dataset as all_user_sessions.

with all_user_sessions as
  (select t1.user_id,
          t1.timestamp::date as date,
          min(t2.timestamp::timestamp) - max(t1.timestamp::timestamp) as session_duration
   from facebook_web_log t1
   join facebook_web_log t2 on t1.user_id = t2.user_id
   where t1.action = 'page_load'
     and t2.action = 'page_exit'
     and t2.timestamp > t1.timestamp
   group by 1,
            2)

2. Once we have information about user sessions on each day, we need to SELECT and create groups for each user_id. Records in each group will describe the sessions of a specific user.

We use the AVG() aggregate function to find the average duration of sessions for the user.

with all_user_sessions as
  (select t1.user_id,
          t1.timestamp::date as date,
          min(t2.timestamp::timestamp) - max(t1.timestamp::timestamp) as session_duration
   from facebook_web_log t1
   join facebook_web_log t2 on t1.user_id = t2.user_id
   where t1.action = 'page_load'
     and t2.action = 'page_exit'
     and t2.timestamp > t1.timestamp
   group by 1,
            2)
select user_id,
       avg(session_duration)
from all_user_sessions
group by user_id

The final output should look like this:

All required columns and the first 5 rows of the solution are shown

user_id	avg_session_duration
0	1883.5
1	35

SQL Scenario Based Interview Question #5: Dates of Inspection

Dates Of Inspection

Interview Question Date: April 2018

City of Los Angeles

HardID 9714

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Software Engineer

Find the latest inspection date for the most sanitary restaurant(s). Assume the most sanitary restaurant is the one with the highest number of points received in any inspection (not just the last one). Only businesses with 'restaurant' in the name should be considered in your analysis. Output the corresponding facility name, inspection score, latest inspection date, previous inspection date, and the difference between the latest and previous inspection dates. And order the records based on the latest inspection date in ascending order.

Table: los_angeles_restaurant_health_inspections

Link to the question: https://platform.stratascratch.com/coding/9714-dates-of-inspection

Data View

Table: los_angeles_restaurant_health_inspections

serial_number	activity_date	facility_name	score	grade	service_code	service_description	employee_id	facility_address	facility_city	facility_id	facility_state	facility_zip	owner_id	owner_name	pe_description	program_element_pe	program_name	program_status	record_id
DAQHRSETQ	2017-06-08	MARGARITAS CAFE	93	A	1	ROUTINE INSPECTION	EE0000006	5026 S CRENSHAW BLVD	LOS ANGELES	FA0023656	CA	90043	OW0004133	BAZAN, ASCENCION	RESTAURANT (61-150) SEATS HIGH RISK	1638	MARGARITAS CAFE	ACTIVE	PR0011718
DA2GQRJOS	2017-03-07	LAS MOLENDERAS	97	A	1	ROUTINE INSPECTION	EE0000997	2635 WHITTIER BLVD	LOS ANGELES	FA0160416	CA	90023	OW0125379	MARISOL FEREGRINO	RESTAURANT (0-30) SEATS HIGH RISK	1632	LAS MOLENDERAS	INACTIVE	PR0148504
DAMQTA46T	2016-03-22	SANDRA'S TAMALES	93	A	1	ROUTINE INSPECTION	EE0001049	5390 WHITTIER BLVD	LOS ANGELES	FA0171769	CA	90022-4032	OW0178828	SANDRA'S TAMALES INC.	RESTAURANT (0-30) SEATS MODERATE RISK	1631	SANDRA'S TAMALES	ACTIVE	PR0164225
DAXMBTIRZ	2018-02-12	CAFE GRATITUDE	97	A	1	ROUTINE INSPECTION	EE0000828	639 N LARCHMONT BLVD STE #102	LOS ANGELES	FA0058921	CA	90004	OW0005704	CAFE GRATITUDE LARCHMONT LLC	RESTAURANT (61-150) SEATS HIGH RISK	1638	CAFE GRATITUDE	ACTIVE	PR0019854
DAK8TBMS0	2015-09-10	THE WAFFLE	90	A	1	ROUTINE INSPECTION	EE0000709	6255 W SUNSET BLVD STE #105	LOS ANGELES	FA0051830	CA	90028	OW0035796	THE WAFFLE, LLC	RESTAURANT (61-150) SEATS HIGH RISK	1638	THE WAFFLE	ACTIVE	PR0010922

To answer this question, we need the following values:

facility_name: We find restaurants by looking at facility_name values. They must contain the text ‘restaurant’.

score: To answer this question, we should output restaurants that received the highest score. We should include specific score values as well.

activity_date: The question asks us to return the latest inspection date, the previous inspection date, and the difference between them. We need to work with activity_date values to find these values.

Logic

Identify the most sanitary restaurants
Calculate the previous inspection date and the difference between the latest and previous inspections
Output specified values and order records

Solution

1. First, let’s find the latest inspection date for all records. The simplest approach is to write a subquery with two values: facility_name and activity_date.

We create this subquery to find the latest inspection date. To do this, we group records by facility_name values and use the max() aggregate function to find each group's highest (latest) inspection date.

select facility_name,
       max(activity_date) as max_date
from los_angeles_restaurant_health_inspections
group by facility_name

We can add the latest date (the result of max() aggregate function) to the main table by performing an INNER JOIN. The main table and the subquery should be combined on a shared dimension - facility name.

SELECT *
from los_angeles_restaurant_health_inspections as la
inner join
  (select facility_name,
          max(activity_date) as max_date
   from los_angeles_restaurant_health_inspections
   group by facility_name) a on la.facility_name = a.facility_name

So far, we have all rows from the los_angeles_restaurant_health_inspections table. But many of the records do not represent restaurants.

According to the question description, a facility can be classified as a restaurant only if its facility_name value contains the text ‘restaurant’. We need an additional WHERE clause to remove rows that do not meet these criteria.

There is one more condition - we should only consider restaurants with the highest inspection scores. So our WHERE clause should have two conditions.

We use the ILIKE logical operator to set the first condition - facility_name should contain ‘restaurant’ text. AND logical operator chains this condition with another one - the score value should be equal to the highest score in the entire table.

This is necessary because the question specifies that we only want restaurants with the highest scores. To get the highest score, we write a short subquery where we use the max() aggregate function to return the highest score value in the entire table.

select *
   from los_angeles_restaurant_health_inspections as la
   inner join
     (select facility_name,
             max(activity_date) as max_date
      from los_angeles_restaurant_health_inspections
      group by facility_name) a on la.facility_name = a.facility_name
   where la.facility_name ilike '%restaurant%'
     and score =
       (select max(score)
        from los_angeles_restaurant_health_inspections)

So far, our query returns restaurants with the highest score:

All required columns and the first 5 rows of the solution are shown

serial_number	activity_date	facility_name	score	grade	service_code	service_description	employee_id	facility_address	facility_city	facility_id	facility_state	facility_zip	owner_id	owner_name	pe_description	program_element_pe	program_name	program_status	record_id	facility_name	max_date
DA0577007	2016-05-23	LOS ARCOS RESTAURANT	100	A	1	ROUTINE INSPECTION	EE0000950	5719 E WHITTIER BLVD	LOS ANGELES	FA0065926	CA	90022	OW0004107	BAUTISTA, ADOLFO	RESTAURANT (61-150) SEATS HIGH RISK	1638	LOS ARCOS RESTAURANT	ACTIVE	PR0016276	LOS ARCOS RESTAURANT	2016-11-02
DA1SSC6PA	2016-11-02	LOS ARCOS RESTAURANT	100	A	1	ROUTINE INSPECTION	EE0000950	5719 E WHITTIER BLVD	LOS ANGELES	FA0065926	CA	90022	OW0004107	BAUTISTA, ADOLFO	RESTAURANT (61-150) SEATS HIGH RISK	1638	LOS ARCOS RESTAURANT	ACTIVE	PR0016276	LOS ARCOS RESTAURANT	2016-11-02
DA7HKLCP1	2017-10-11	LOS POBLANOS RESTAURANTE	100	A	1	ROUTINE INSPECTION	EE0000800	1433 W JEFFERSON BLVD	LOS ANGELES	FA0035885	CA	90018	OW0037663	VICTORINO TEYUCA	RESTAURANT (0-30) SEATS HIGH RISK	1632	LOS POBLANOS RESTAURANTE	ACTIVE	PR0039400	LOS POBLANOS RESTAURANTE	2017-10-11
DA0769732	2017-02-01	CANCUN RESTAURANT	100	A	1	ROUTINE INSPECTION	EE0000950	5219 E BEVERLY BLVD	LOS ANGELES	FA0012815	CA	90022	OW0028703	MARGARITA R. PLATON	RESTAURANT (31-60) SEATS LOW RISK	1633	CANCUN RESTAURANT	ACTIVE	PR0022877	CANCUN RESTAURANT	2017-02-01
DAH1UVE0U	2016-12-01	B.L. RESTAURANT	100	A	1	ROUTINE INSPECTION	EE0000721	617 W 7TH ST	LOS ANGELES	FA0164787	CA	90017	OW0128102	BHFC OPERATING LLC	RESTAURANT (0-30) SEATS HIGH RISK	1632	B.L. RESTAURANT	ACTIVE	PR0153075	B.L. RESTAURANT	2016-12-01

As you can see, all facilities have the text ‘restaurant’ in their facility_name and score of 100.

This table has many columns. In the preview, you need to scroll all the way to the right to see the max_date column. Values in this column refer to the latest inspection date.

2. Next, we need to work with available data to create two values: the date of the previous inspection (previous from the latest) and the difference between the latest and previous inspections.

Each record in the table describes one inspection. It’s safe to assume that a restaurant can have multiple inspection records.

To facilitate finding the previous inspection, we should list all inspections for a specific restaurant and order them ascendingly.

Because dates are ordered in ascending order (earlier ones first, followed by later inspections), the previous inspection is simply the activity_date value of the previous row.

To follow this approach, we need to essentially ‘group’ inspection records for each restaurant and arrange them in ascending order by their activity_date values.

Window functions are the perfect SQL feature for this task. They can be used to generate a new value for each row in the partition, which is essentially a group of rows for every unique facility_name value. We can arrange inspections by their date in ascending order in each partition.

Finally, we need to create a new value for the current row by taking the activity_date row of the previous row. Fortunately, there is a LAG() window function in SQL to do just that. It creates a new column but offsets each value by one row.

Learn more about LAG() and window functions here.

The difference between the inspection date and the previous one is easier to find. Simply cast activity_date to a date type and subtract the previous date.

Finally, we have every value we need - facility name, inspection score, latest inspection date, previous inspection date, and the difference between high-scoring inspections.

We should use the WITH clause to save the query as a common table expression so we can SELECT these values in the final step.

with cte as
  (select la.facility_name,
          score,
          activity_date,
          max_date,
          lag(activity_date, 1) over(partition by la.facility_name
                                     order by activity_date) as prev_activity_date,
          activity_date :: date - lag(activity_date :: date, 1) over(partition by la.facility_name                                                
order by activity_date) as number_of_days_between_high_scoring_inspections
   from los_angeles_restaurant_health_inspections as la
   inner join
     (select facility_name,
             max(activity_date) as max_date
      from los_angeles_restaurant_health_inspections
      group by facility_name) a on la.facility_name = a.facility_name
   where la.facility_name ilike '%restaurant%'
     and score =
       (select max(score)
        from los_angeles_restaurant_health_inspections))

3. Finally, we can SELECT all the necessary values from the common table expression. The question tells us to return records for only the latest inspections.

We can use an additional WHERE clause with one condition - inspection date (activity_date column) should be the same as max_date (the latest date).

with cte as
  (select la.facility_name,
          score,
          activity_date,
          max_date,
          lag(activity_date, 1) over(partition by la.facility_name
                                     order by activity_date) as prev_activity_date,
          activity_date :: date - lag(activity_date :: date, 1) over(partition by la.facility_name
                                                                     order by activity_date) as number_of_days_between_high_scoring_inspections
   from los_angeles_restaurant_health_inspections as la
   inner join
     (select facility_name,
             max(activity_date) as max_date
      from los_angeles_restaurant_health_inspections
      group by facility_name) a on la.facility_name = a.facility_name
   where la.facility_name ilike '%restaurant%'
     and score =
       (select max(score)
        from los_angeles_restaurant_health_inspections))
select facility_name,
       score,
       activity_date,
       prev_activity_date,
       number_of_days_between_high_scoring_inspections
from cte
where activity_date = max_date

Here’s how the expected output should look like.

All required columns and the first 5 rows of the solution are shown

facility_name	score	activity_date
B.L. RESTAURANT	100	2016-12-01
CANCUN RESTAURANT	100	2017-02-01
DENNY'S RESTAURANT #7757	100	2017-05-26
D'LIDO RESTAURANT AND BAKERY	100	2016-02-03
EARLES RESTAURANT	100	2018-03-16

Conclusion

While solving these SQL scenario based interview questions we looked at the different ways in which you can manipulate the datetime fields in SQL. We saw the difference between the different types and when to use each type. We also converted a datetime field to a text and vice versa using the TO_CHAR() and TIMESTAMP functions. We also worked on problems that used only a part of the timestamp.

Datetime manipulation is a favorite of interviewers because timestamped data is omnipresent. Using specific datetime functions can help you save a lot of time that would be otherwise wasted in multiple subqueries. You too can master SQL Date time functions like these and many other SQL functions on StrataScratch. We have a community of over 20,000 like-minded data science enthusiasts. Join today and give yourself the best chance of acing Data Science Interviews at FAANG and other top companies.

SQL Scenario Based Interview Questions and Answers

Categories

SQL Datetime concepts and functions frequently tested in Facebook SQL Scenario Based Interview Questions

Date Time Variable Types

SQL Scenario Based Interview Questions

SQL Scenario Based Interview Question #1: Number of Comments Per User in Past 30 days

Number of Comments Per User in 30 days before 2020-02-10

SQL Scenario Based Interview Question #2: Comments Distribution

Comments Distribution

SQL Scenario Based Interview Question #3: Time Between Two Events

Time Between Two Events

SQL Scenario Based Interview Question #4: Users by Average Session Time

Users By Average Session Time

SQL Scenario Based Interview Question #5: Dates of Inspection

Dates Of Inspection

Conclusion

Latest Posts:

From Arrays to Lists: Converting Numpy Arrays with Python

Tips for Crafting an Impressive Data Analyst Portfolio

PySpark GroupBy Guide: Super Simple Way to Group Data

Share

Follow

Categories