Facebook Data Analyst Interview Question - Most Active Users on Messenger

This is one of the important Facebook data analyst interview questions that is asking to find out the most active users on Facebook Messenger.

Real Data Analyst Interview Question From Facebook

This data analyst interview question was asked by Facebook during an interview.

Question

Most Active Users On Messenger

Facebook Messenger stores the number of messages between users in a table named 'fb_messages'. In this table 'user1' is the sender, 'user2' is the receiver, and 'msg_count' is the number of messages exchanged between them. Find the top 10 most active users on Facebook Messenger by counting their total number of messages sent and received.

Our suggestion is to try to solve this Facebook data analyst interview question first before reading the solution. You can try to solve this facebook data analyst interview question here (https://platform.stratascratch.com/coding-question?id=10295&python=).

The question is actually on the bottom - "Find the top 10 most active users on Facebook Messenger by counting their total number of messages sent and received".

Some background information is also there in this Facebook data analyst interview question - "Facebook Messenger stores the number of messages between users in a table named 'fb_messages'. In this table 'user1' is a sender, 'user2' is the receiver, and 'msg_count' is the number of messages exchanged between them".

If we press the preview button, we'll get the underlying data.

The most important columns are 'user1', 'user2', and 'msg_count'.

Assumptions

The important thing to consider here is that both sending and receiving messages count as activity. In this data set, we have to count both 'user1' who sends a message, and 'user2' who receives a message. We have to count both of these users and the messages that they either received or sent because they're activities.

In order to do this, we need to create one column where we're basically listing both 'user1' and 'user2', and then we'll have a 'msg_count' as the second column. To accomplish this, we're going to use a UNION ALL to put both 'user1' and 'user2' in one column.

Approach

SELECT
  user1,
  msg_count
FROM fb_messages

UNION ALL

SELECT
  user2,
  msg_count
FROM fb_messages

So, this is how we do this.

There are two important things to consider here:

1. We are using a UNION ALL

The reason why we're using a UNION ALL is that we don't want to duplicate any of the rows. What I mean by that is there is a chance that 'user1' the sender of the message has the exact same message count as 'user2' who receives a message.

For example, I could have a username named 'userA' who got five messages and thus sent five messages. And this would come from the top table. I could also have the same user, 'userA' again, receiving five messages. And this record would come from the bottom table. If we use the UNION, this would collapse together. It would deduplicate and that's not what we want. We want to treat both sending and receiving as activities. We really have to keep these two records separate from each other, and not de-duplicated.

2. A UNION or a UNION ALL

If you see both the top columns and the bottom columns need to have the same names. Where 'msg_count' is fine, but 'user1' and 'user2' are not the same. So what we're going to do now is rename the 'user1' as 'username' and then rename the bottom 'user2' too as 'username'.

SELECT
  user1 as username,
  msg_count
FROM fb_messages

UNION ALL

SELECT
  user2 as username,
  msg_count
FROM fb_messages

Now, these column names are identical. So then you can use a union all.

One last important thing to consider is that we cannot rename the column as 'user', because 'user', for most databases, is a reserved word. So you might actually see an error if you try to run the code. So, instead of 'user', we're going to use 'username'.

If we run this code, it gets exactly what we want.

We have a 'username' and all 'user1' and 'user2' are in this column now named 'username'. And then we have the message count.

Now, for these users, we have to count all of their messages. They could be listed several times in this dataset and we need to sum up all of their messages.

To implement that, this is going to turn into a subquery.

Solution

SELECT
  username,
  sum(msg_count) as total_msg
FROM (
    SELECT
      user1 as username,
      msg_count
    FROM fb_messages

    UNION ALL

    SELECT
      user2 as username,
      msg_count
    FROM fb_messages) a
GROUP BY username

I just wrote the FROM, and then alias this subquery with any letter or word and I chose a. Then I wrote the SELECT clause. And in the SELECT clause, what we want is 'username'. Then we want a sum of the 'msg_count' and named this as 'total_msg'. Then we have the FROM and a GROUP BY username.

Now, let's just run this code.

And this is exactly what we want. The only problem is there are more than 10 users and we don't have any ordering. What we're trying to find is the top 10 most active users.

So, to get the most active users, we will use ORDER BY total_msg and then make it DESC. And to get the top 10, we'll do LIMIT 10 on it.

SELECT
  username,
  sum(msg_count) as total_msg
FROM (
    SELECT
      user1 as username,
      msg_count
    FROM fb_messages

    UNION ALL

    SELECT
      user2 as username,
      msg_count
    FROM fb_messages) a
GROUP BY username
ORDER BY total_msg DESC
LIMIT 10

Now, Let's run this code:

These are the top 10 most active users on Facebook messenger.