Airbnb Python Interview Question: Host Popularity Rental Prices

Airbnb Python Interview Question


We’ll walk you through one of the Airbnb Python interview questions and show you how to do data aggregation and labeling in Python.

Airbnb is an abbreviation for Air Bed and Breakfast. You probably know that, and the chance is you also used it at least once.

It’s a platform connecting property owners who want to rent their places with guests searching for places to stay. The platform allows guests to rate hosts and vice versa, which is a relatively efficient way of rooting out terrible hosts and guests.

Of course, if the hosts and their place get better reviews, their business increases.

To confirm that, you can see here that Airbnb’s superhosts welcome more guests. There are four conditions for becoming a superhost: overall ratings (4.8+), stays(10+), cancelation rate(<1%), and response rate(90%).

This is one of the hard level python interview questions, in which we will develop a bit different host categorization. Let’s look at our question.

There’s also a Youtube video on our channel to better understand what we’ll talk about.

Interview Question: Find the minimum, average, maximum rental prices for each host’s popularity rating.


Table: airbnb_host_searches

In this question, Airbnb asks us to classify the host's popularity according to the number of reviews the host has. It also wants us to show the min, max, and average prices for each popularity category.

That means the output should contain 4 columns, which are:

  • host_popularity
  • minimum
  • maximum
  • average prices

Here are the conditions for each host popularity category:

0 reviews:
New1 to 5 reviews:
Rising6 to 15 reviews:
Trending Up16 to 40 reviews: Popular
more than 40 reviews: Hot

You can find all this and the question itself on the link here → https://platform.stratascratch.com/coding/9632-host-popularity-rental-prices.

Now, let’s start solving the question by exploring our data first.

1. Exploring the Dataset

To frame your problem, you should understand your data first. Here, Airbnb provides us with the airbnb_host_searches data frame.

Let’s see the columns and data types of these columns.

Table: airbnb_host_searches
idpriceproperty_typeroom_typeamenitiesaccommodatesbathroomsbed_typecancellation_policycleaning_feecityhost_identity_verifiedhost_response_ratehost_sinceneighbourhoodnumber_of_reviewsreview_scores_ratingzipcodebedroomsbeds
8284881621.46HouseEntire home/apt{TV,"Cable TV",Internet,"Wireless Internet","Air conditioning",Pool,Kitchen,"Free parking on premises",Gym,"Hot tub","Indoor fireplace",Heating,"Family/kid friendly",Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit","Safety card","Fire extinguisher",Essentials,Shampoo,"24-hour check-in",Hangers,"Hair dryer",Iron,"Laptop friendly workspace"}83Real BedstrictTRUELAf100%2016-11-01Pacific Palisades19027246
8284882621.46HouseEntire home/apt{TV,"Cable TV",Internet,"Wireless Internet","Air conditioning",Pool,Kitchen,"Free parking on premises",Gym,"Hot tub","Indoor fireplace",Heating,"Family/kid friendly",Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit","Safety card","Fire extinguisher",Essentials,Shampoo,"24-hour check-in",Hangers,"Hair dryer",Iron,"Laptop friendly workspace"}83Real BedstrictTRUELAf100%2016-11-01Pacific Palisades19027246
9479348598.9ApartmentEntire home/apt{"Wireless Internet","Air conditioning",Kitchen,Heating,"Smoke detector","Carbon monoxide detector",Essentials,Shampoo,Hangers,Iron,"translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50"}72Real BedstrictFALSENYCf100%2017-07-03Hell's Kitchen1601003634
8596057420.47HousePrivate room{"Wireless Internet","Air conditioning",Pool,Kitchen,"Free parking on premises",Breakfast,"Family/kid friendly",Washer,Dryer,Essentials,Shampoo,Hangers,"Hair dryer","Self Check-In","Doorman Entry"}12Real BedflexibleFALSELAf100%2016-04-2009174811
11525500478.75ApartmentEntire home/apt{"Wireless Internet","Air conditioning",Heating,Washer,Dryer,Essentials,"Laptop friendly workspace",Microwave,Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"Host greets you"}21Real BedflexibleTRUENYCf100%2017-10-07Williamsburg21001120611

We have host data containing id, price, property type, and more.

Now, since we got the information about our columns, let’s look at the first rows of our data using the HEAD( ) function to collect more information.

airbnb_host_searches.head()

Let’s look at our output.

All required columns and the first 5 rows of the solution are shown

idpriceproperty_typeroom_typeamenitiesaccommodatesbathroomsbed_typecancellation_policycleaning_feecityhost_identity_verifiedhost_response_ratehost_sinceneighbourhoodnumber_of_reviewsreview_scores_ratingzipcodebedroomsbeds
8284881621.46HouseEntire home/apt{TV,"Cable TV",Internet,"Wireless Internet","Air conditioning",Pool,Kitchen,"Free parking on premises",Gym,"Hot tub","Indoor fireplace",Heating,"Family/kid friendly",Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit","Safety card","Fire extinguisher",Essentials,Shampoo,"24-hour check-in",Hangers,"Hair dryer",Iron,"Laptop friendly workspace"}83Real BedstrictTRUELAf100%2016-11-01 00:00:00Pacific Palisades19027246
8284882621.46HouseEntire home/apt{TV,"Cable TV",Internet,"Wireless Internet","Air conditioning",Pool,Kitchen,"Free parking on premises",Gym,"Hot tub","Indoor fireplace",Heating,"Family/kid friendly",Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit","Safety card","Fire extinguisher",Essentials,Shampoo,"24-hour check-in",Hangers,"Hair dryer",Iron,"Laptop friendly workspace"}83Real BedstrictTRUELAf100%2016-11-01 00:00:00Pacific Palisades19027246
9479348598.9ApartmentEntire home/apt{"Wireless Internet","Air conditioning",Kitchen,Heating,"Smoke detector","Carbon monoxide detector",Essentials,Shampoo,Hangers,Iron,"translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50"}72Real BedstrictFALSENYCf100%2017-07-03 00:00:00Hell's Kitchen1601003634
8596057420.47HousePrivate room{"Wireless Internet","Air conditioning",Pool,Kitchen,"Free parking on premises",Breakfast,"Family/kid friendly",Washer,Dryer,Essentials,Shampoo,Hangers,"Hair dryer","Self Check-In","Doorman Entry"}12Real BedflexibleFALSELAf100%2016-04-20 00:00:0009174811
11525500478.75ApartmentEntire home/apt{"Wireless Internet","Air conditioning",Heating,Washer,Dryer,Essentials,"Laptop friendly workspace",Microwave,Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"Host greets you"}21Real BedflexibleTRUENYCf100%2017-10-07 00:00:00Williamsburg21001120611

Here you can see we have decimals in our price columns. That might be a problem when we calculate the min, max, and average.

Also, when we select the price and review to classify the popularity, the two different listings could have the same price and reviews. This might require a workaround when removing the duplicates.

Now, let’s continue data exploring to know our data better, which will help us to gain more information so that we can do analysis easier.

airbnb_host_searches.info()

Here is our output.

Output for Airbnb python interview question

We can see the length of columns, data types, and the number of columns with non-null counts.

Let’s continue writing out the approach.

2. Writing Out the Approach

Approach to solve Airbnb python interview questions

After exploring the data, we will split our problem into codable stages. This will help us frame this problem easily and turn this approach into a code.

Step 1:Import the libraries

Here, we will import the Python libraries. Pandas allow us to manipulate the dataset. We will import NumPy, too, because we will work with arrays.

Step 2: Format to two decimals

Since we will do calculations like min, max and average, we have to format our data. So we will round floats to two decimal places.

Step 3: Rename the data frame

We will do a lot of operations in the following steps. We will rename the data frame to make the code syntax more readable and short.

Step 4: Drop the duplicates

The data that we will work with contains the searches by users. So that may have the host data duplicates. That’s why we will remove them and do calculations once for each host.

Step 5: Conditional statements with a lambda function

There are many ways to classify your host popularity category. The first thing that comes to your mind is to write an if-else block, which might take too much effort and be hard to read. We will use the lambda function to make your code neater and shorter.

Step 6: Calculate the min, max, and average by grouping the columns.

As a final step, we will calculate the min, max, and average for each host popularity category and assign them as a column to our data frame.

With these steps, the coding becomes easier. This approach can, of course, vary according to your solution. With time, you’ll get used to splitting the question into a codable approach according to your coding abilities.

The general advice is to try and make your code shorter and more readable.

3. Coding the Question

Now, let’s turn this approach into coding.

Step 1. Import the libraries

Import pandas and NumPy because we will work with arrays. We will also manipulate data and change the data format to the desired output.

import pandas as pd
import numpy as np

Step 2: Format to two decimals

Our columns contain the float values. The question tells us to find the min, max, and average prices for each host's popularity. That’s why we will format our data float values with two decimal places.

To do that, here is our code.

pd.options.display.float_format = "{:,.2f}".format

Step 3: Rename the data frame

Now our format is ready. Since we will do many manipulations to our data, we will build a small pipeline. To do that, we will use our DataFrame name multiple times. To shorten our syntax, we will rename our DataFrame as df. That’s how our code will look shorter and neater.

df = airbnb_host_searches

Step 4: Drop the duplicates

The data that we have contains searches by users.

What does that mean?

It means that we might have duplicates. To solve the interview question, we need to select the price and the number of reviews columns.

But what if the two listings have the same price and number of reviews? If we simply remove the duplicates, we might unintentionally remove some listings.

To prevent that, we will create a column that contains the host ID. That will help us not to remove valid data accidentally.

To see the difference between the two approaches, let’s drop our duplicates with and without creating this column and use the INFO( ) method to see the length of our result.

#Drop the duplicates by adding a new column “host_id”
df['host_id'] = df['price'].map(str)+df['room_type'].map(str)+ df['host_since'].map(str)+df['zipcode'].map(str)+df['number_of_reviews'].map(str)

df1 = df[['host_id','number_of_reviews','price']].drop_duplicates()
df1.info()


#Drop the duplicates without creating a new column.
df2 = df[['number_of_reviews', 'price']].drop_duplicates()

df2.info()

Here is our output.

Output for Airbnb python interview question


Here you can see that not creating a host ID column will remove possible matching listings (same price and number of reviews). In total, we can accidentally remove 9 rows. (160 - 151 = 9)

Our solution will involve creating the host_id column. Here is the code.

df['host_id']= df['price'].map(str)+df['room_type'].map(str)+df['host_since'].map(str)+df['zipcode'].map(str)+ df['number_of_reviews'].map(str)
df1 = df[['host_id','number_of_reviews','price']].drop_duplicates()


Here is the output.

All required columns and the first 5 rows of the solution are shown

host_idnumber_of_reviewsprice
621.46Entire home/apt2016-11-01 00:00:009027211621.46
598.9Entire home/apt2017-07-03 00:00:001003611598.9
420.47Private room2016-04-20 00:00:009174800420.47
478.75Entire home/apt2017-10-07 00:00:001120622478.75
662.01Entire home/apt2016-01-20 00:00:009412300662.01

Step 5: Conditional statements with the lambda function

Now, we have different options for categorizing the host popularity. One common solution is to write an if-else block. Yet after writing the if-else block, your code will be longer and harder to read. Of course, there are many problems where using the if-else block is mandatory.

Yet, here we can use the lambda( ) function with the apply( ) function to broadcast our custom function to our DataFrame.

By using it with an index bracket, we will create a new column namem host_popularity.

The criteria for the host popularity categories is as follows. If the host has no reviews, it will be classified as “New”. Between 1 and 5 reviews is “Rising”, and between 6 to 15 “Trending Up”. If the host has between 16 and 40 reviews, they are “Popular”, and with more than 40 “Hot”.

So we will use the lambda function and  if-else statements with that conditions.

Here is our code.

df1['host_popularity'] = df1['number_of_reviews'].apply(lambda x:'New' if x<1 else 'Rising' if x<=5 else 'Trending Up' if x<=15 else 'Popular' if x<=40 else 'Hot')

Here is the output.

All required columns and the first 5 rows of the solution are shown

indexhost_popularity
0Rising
2Rising
3New
4Rising
5New

Step 6: Calculate the min, max, and average by grouping the columns.

Okay, we previously classified our hosts according to the number of reviews. Now, it is time to match these reviews with the min, max, and average prices.

To do that, first, we have to calculate these. Before doing that, let’s mention something useful for you if you are interested in solving coding problems. When you see “each” in the coding questions, generally, that means you should use the groupby( ) function.

Let’s see our question once again. It says: “Find the minimum, average, and maximum rental prices for each host’s popularity rating.”

Now, that means we will group by the host popularity first. Then we have to calculate the min, max, and average and assign them as the columns to our DataFrame. One easy way to do this is using the agg( )with our groupby( ) function. The agg( ) function helps us create new columns with names and functions inside it.

Here is the agg( ) function official library to see its usage and arguments.

The final step is to remove the indexes that the groupby( ) function adds.

Let’s see our final code.

result= df1.groupby('host_popularity').agg(min_price=('price',min),avg_price=('price',np.mean),max_price = ('price',max)).reset_index()

Here is the output.

All required columns and the first 5 rows of the solution are shown

host_popularitymin_priceavg_pricemax_price
Hot340.12464.233633.51
New313.55515.92741.76
Popular270.81472.815667.83
Rising355.53503.847717.01
Trending Up361.09476.277685.65

Conclusion

To solve this Airbnb python interview question, we formatted our DataFrame and removed the duplicates by creating a column first to prevent subsequent code debugging.

Then we classified the host's popularity according to reviews and calculated the average, minimum and maximum prices.

I hope you enjoyed this question. You can learn special functions like lambda, formatting options, and the different perspectives of duplicate removal.

Check out our post “Python coding interview questions” to see different coding challenges and enhance your coding ability, be familiar with different companies' interview questions and land a new job.

Airbnb Python Interview Question


Become a data expert. Subscribe to our newsletter.