Facebook Data Science Interview Question and Solution in SQL - Friend Acceptance Rate

Facebook data science interview


This data science interview question was asked by Facebook during a data science interview.

This article will cover both the solution to the question and a detailed explanation of the approach I take to answer this Facebook data science interview question. I walk through each step of my solution design, talk about business context and impact, and explain every line of code I write.

Question

Our suggestion is to try to solve the data science interview question first before reading the solution. You can try to solve this Facebook Data science interview question.

What is the overall friend acceptance rate by date?

Order by the latest friend request date to the earliest date.

The question asks the interviewee to calculate the friend acceptance rate over time, which means you need to calculate a rate and group it by date. Technically, this question leverages SQL concepts like LEFT JOIN, GROUP BY, SUBQUERIES, and data type conversions from INT to FLOAT or DECIMAL.


Table: fb_friend_requests
user_id_senderuser_id_receiverdateaction
ad4943sdz948ksx123d2020-01-04sent
ad4943sdz948ksx123d2020-01-06accepted
dfdfxf94839djjjd92832020-01-04sent
dfdfxf94839djjjd92832020-01-15accepted
ffdfff4234234lpjzjdi49492020-01-06sent

You're getting this question during a Facebook data science interview and you are given the data set schema where we have the sender, the receiver, the date and then the action. If we preview this table, we have again the sender, the receiver, the date and then the action sent.

Assumptions

Just starting off with this question, I am going to assume that every time a sender sends a friend request to the receiver that's always logged in this table as a sent action. We know that the denominator of the acceptance rate that we want to try to find there is sent. We also know that the numerator for that rate should be accepted in the action column.

Approach

Defining acceptance rate

What we're going to do first is define acceptance rate as the number of accepted divided by the number of sent requests. This becomes

acceptance rate = # accepted / # sents

Facebook data science interview questions

Splitting rows by accepted and sent records

The next thing I want to do is be able to split the number of sent records and then be able to split the number of accepted records and then group them by date. In order to do that I'm going to use two subqueries to split this one table and make it into two tables. What I like to do when I create my solution especially in SQL is actually work starting at from the tables and the JOIN clause and work myself out.

In order to do that I can write SELECT and then user_id_sender, user_id_receiver, date and then action. Basically, the columns of the table FROM fb_friend_requests, WHERE action equals ‘sent’. That's the first subquery and if I run this code, I get exactly what I'm expecting - all of the sent records.

SELECT
  user_id_sender,
  user_id_receiver,
  date,
  action
FROM fb_friend_requests
WHERE action='sents'

solving data science interview question

Then what I'm going to do is I am going to copy this subquery and create a new subquery. And now after I preview it, instead of sent, I have accepted.

SELECT
  user_id_sender,
  user_id_receiver,
  date,
  action
FROM fb_friend_requests
WHERE action='accepted'

Interview question from Facebook

Creating my subqueries and creating the outer SELECT

These are my subqueries. I will now work from the outer top SELECT and use my subqueries in the FROM clause.

SELECT
   *
FROM (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='sents') a

SELECT
  user_id_sender,
  user_id_receiver,
  date,
  action
FROM fb_friend_requests
WHERE action='accepted'

Facebook data science interview question

Adding the JOIN

Now what I will try to figure out next is what sort of JOIN I want to implement. I could use an INNER JOIN, LEFT JOIN, RIGHT JOIN, or an OUTER JOIN. In this case, we know that we are creating a ratio or a percentage. So, we're going to have a denominator and a numerator and what I'm doing with these subqueries is I'm actually defining the numerator and denominator and then joining them together.

What I really want to do is implement a LEFT JOIN to preserve all of the friend requests that were not accepted yet. This will be more obvious once I finish this query and show you the underlying data.

SELECT
   *
FROM (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='sents') a
LEFT JOIN (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='accepted') b


Now on the keys, all I'm going to do is just line up the sender and the receivers for both tables. We're going to use two keys because we definitely want the sender and the receiver to be matched together since that's the primary key combination of these two users.

SELECT
   *
FROM (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='sents') a
LEFT JOIN (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='accepted') b on b.user_id_sender = a.user_id_sender
       and b.user_id_receiver = a.user_id_receiver

facebook data science interview

That looks pretty good so far. Here's the output of the code to show you what I mean by lining up accepted and sent requests.

data science interview question and solution

On the left, I have all of the friend requests and on the right side, I have all the accepted. You can see that there are some NULL records where we did a LEFT JOIN and the receiver did not accept the friend request so that's why you're seeing some blank rows.

Now what we want to do is count the number of accepted and then count up the number of sents and then group that by the date. In order to do that I will put the date column first and then define the number of accepted and the number of sent.

SELECT
   date,
   count(b.user_id_receiver)/count(a.user_id_sender)
FROM (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='sents') a
LEFT JOIN (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='accepted') b on b.user_id_sender = a.user_id_sender
       and b.user_id_receiver = a.user_id_receiver
facebook data science interview

Data type conversion to output a decimal

The next thing I want to do is turn this entire operation here into a decimal. So, we're going to actually turn that into a FLOAT.

SELECT
   date,
   count(b.user_id_receiver)/count(a.user_id_sender)::float
FROM (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='sents') a
LEFT JOIN (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='accepted') b on b.user_id_sender = a.user_id_sender
       and b.user_id_receiver = a.user_id_receiver

The reason why we have this casting of this count into a FLOAT is that we basically have integers with this count of receiver and an integer where we're counting the senders. So, an integer divided by an integer is going to remain an integer and it's just going to give us a 1 if we don't change this integer into a decimal or into a float. That's what we're doing with this casting operation.

Once we change the data type to a FLOAT, we can rename the column. Now we have the date and the acceptance rate and all we need now is just a GROUP BY. We'll GROUP BY the date. Lastly, we have the ORDER BY because this Facebook data science interview question is asking us to order by date from the latest date to the earliest date. What I'm going to do now is just type in date descending.

Solution

If I run this query, I get the date and the acceptance rate as an actual decimal or as an actual float.

SELECT
   a.date,
   count(b.user_id_receiver)/count(a.user_id_sender)::float
   as acceptance_rate
FROM (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='sents') a
LEFT JOIN (
 SELECT
   user_id_sender,
   user_id_receiver,
   date,
   action
 FROM fb_friend_requests
 WHERE action='accepted') b on b.user_id_sender = a.user_id_sender
       and b.user_id_receiver = a.user_id_receiver
GROUP BY a.date
ORDER BY a.date DESC

Data science interview problems and solutions

Now I have acceptance rate by date . I took from the one table given and split it into a table where we captured friend requests being sent and friend requests being accepted. And then we joined those two together preserving the number of sent requests. That allowed us to basically form a rate calculation where we are counting the number of acceptances and the number of senders. That is how we can get the acceptance rate.

Check out "Facebook data science interview questions" or "data science interview questions" to find more questions to practice.

Facebook data science interview


Become a data expert. Subscribe to our newsletter.