Airbnb Data Scientist Interview Questions

Airbnb Data Scientist Interview Questions


This article will teach you how to solve one of the hard Airbnb data scientist interview questions.

Airbnb is one of the most popular companies for data scientists. It’s in the lodging and the hospitality industry. While traditionally, you wouldn’t expect this to be an industry of high interest for data scientists, Airbnb is different. And this difference, of course, reflects in the fact they operate as an online marketplace for accommodation that connects hosts and guests. To do that, they depend on data and data scientists.

And you depend on impressing at the job interview. If you’re interested in working at Airbnb, that is. One of the things that makes it easier for you to do well in the interview is knowing what type of questions they ask at Airbnb.

The other thing is answering the concrete question, like the one we’ll talk about right now.

Airbnb Data Scientist Interview Question

Host Popularity Rental Prices

One of the hard questions from the Airbnb interview is this one:

Airbnb Data Scientist Interview Questions

Link to the question: https://platform.stratascratch.com/coding/9632-host-popularity-rental-prices

Note that the question on the platform has been slightly modified. Compared to the video, the main difference is the names of the popularity rating categories and the table. Because of that, the solution here will differ slightly, but the approach will be the same.

Data

You’ll have the table airbnb_host_searches at your disposal to solve this problem.

airbnb_host_searches

idint
pricefloat
property_typevarchar
room_typevarchar
amenitiesvarchar
accommodatesint
bathroomsint
bed_typevarchar
cancellation_policyvarchar
cleaning_feebool
cityvarchar
host_identity_verifiedvarchar
host_response_ratevarchar
host_sincedatetime
neighbourhoodvarchar
number_of_reviewsint
review_scores_ratingfloat
zipcodeint
bedroomsint
bedsint

Run the code in the widget to see the data.

It should look like this. We’ll be showing only the first three rows. Showing the whole table would take up too much space.

Data for Airbnb data scientist interview question

The Solution Approach

The approach to writing a solution code can be broken down into these steps:

  • Find the distinct hosts
  • Create popularity rating categories using the CASE WHEN statement
  • Find the minimum, average, and the maximum rental price for each category using the aggregate functions
  • GROUP BY category

The Solution Assumptions

Solution Assumptions for Airbnb Data Scientist Interview Question

When you’re at the data science interview, it’s always advisable to state your assumptions. That way, the interviewer can understand your point of view and evaluate your solution against it. If you do that, you’ll avoid the possibility that your solution gets marked down just because you didn’t explain the assumptions involved in your solution.

Also, by stating the assumption, you’ll get a clearer idea of how to solve the interview question. Our one and only assumption for this Airbnb data scientist interview question is:

  • all the values in the number_of_reviews column are integer and there’ll be no NULL values

The Airbnb Interview Question Solution

We should follow the steps outlined above.

The first thing to do here is to find the unique hosts. The tip from the question advises us to create our own host_id by concatenating several columns.

SELECT DISTINCT CONCAT(price, room_type, host_since, zipcode, number_of_reviews) AS host_id,
          number_of_reviews,
          price
   FROM airbnb_host_searches

We’re using the CONCAT() function to create the host ID and to write the query as a CTE. We’ll also need information about the number of reviews and the price for every host.


This query is written as a CTE with the name hosts. It will output the result with the following first few rows:

Output for Airbnb data scientist interview question

The next step is to create the rating categories. To do that, you need to use the CASE WHEN statement. The most practical way is to write this part of a query as a CTE.

SELECT CASE
              WHEN number_of_reviews = 0 THEN 'New'
              WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising'
              WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up'
              WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular'
              WHEN number_of_reviews > 40 THEN 'Hot'
          END AS host_popularity,
          price
   FROM hosts;

If you run the first and the second step together, the code will return the following result (showing you only the first few rows):

Output 2 for Airbnb data scientist interview question

Now, we have to calculate the minimum, average, and maximum prices and group by the host’s popularity. We’ll do that by writing the SELECT statement with the aggregate function and using the previous part of the code as a subquery in the following way:

SELECT host_popularity AS host_pop_rating,
       MIN(price) AS min_price,
       AVG(price) AS avg_price,
       MAX(price) AS max_price
FROM
  (SELECT CASE
              WHEN number_of_reviews = 0 THEN 'New'
              WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising'
              WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up'
              WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular'
              WHEN number_of_reviews > 40 THEN 'Hot'
          END AS host_popularity,
          price
   FROM hosts) a
GROUP BY host_pop_rating;

All three SQL aggregate functions are applied to the column price and the data is taken from the subquery. On the other hand, the subquery takes the data from the CTE.

If you put all this together, the final solution will be:

WITH hosts AS
  (SELECT DISTINCT CONCAT(price, room_type, host_since, zipcode, number_of_reviews) AS host_id,
          number_of_reviews,
          price
   FROM airbnb_host_searches)

SELECT host_popularity AS host_pop_rating,
       MIN(price) AS min_price,
       AVG(price) AS avg_price,
       MAX(price) AS max_price
FROM
  (SELECT CASE
              WHEN number_of_reviews = 0 THEN 'New'
              WHEN number_of_reviews BETWEEN 1 AND 5 THEN 'Rising'
              WHEN number_of_reviews BETWEEN 6 AND 15 THEN 'Trending Up'
              WHEN number_of_reviews BETWEEN 16 AND 40 THEN 'Popular'
              WHEN number_of_reviews > 40 THEN 'Hot'
          END AS host_popularity,
          price
   FROM hosts) a
GROUP BY host_pop_rating;

This query returns the correct answer.

 Final output for Airbnb data scientist interview question

Conclusion

By asking this question during the data science job interview, Airbnb tests several important SQL concepts. For example, finding the distinct values and manipulating data using the CONCAT() function.

The second important concept is the SQL aggregate functions. Also, this Airbnb data scientist interview question tested your skills in using the CASE WHEN statement and grouping data.

Also, to make all these concepts work in a quite neat query, you’d have to show proficiency in writing CTEs and subqueries.Along with your technical knowledge, how well you understand the question and approach the solution will also factor in writing a correct answer. Following best practices in structuring and writing the SQL code is always a good idea!

Airbnb Data Scientist Interview Questions


Become a data expert. Subscribe to our newsletter.