Get data science interview questions delivered to your inbox
Please enter a valid email address.

Data Science Coding Interview Questions with 5 Technical Concepts

Understanding 5 coding concepts of data science coding interview questions that companies will test you on and preparing for them effectively

Data Science Coding Interview Questions with 5 Coding Concepts

A data science interview involves multiple rounds and one of such rounds involves coding interview questions. The purpose of these data science coding interview questions is to check if a candidate can program and knows the required coding languages such as SQL and Python. We’ll cover the coding questions you may receive during your data science technical interview round.

In this article, we'll cover the 5 coding concepts that companies will test you on in your data science interviews in 2021. As you guys know, we’re always tracking and collecting data science coding interview questions from various companies to better understand what technical concepts you need to know to succeed in interviews and what skills you need to have to succeed on the job. So these are coding questions that I’ve collected in the first few months of 2021. So they’re as real-time as can be.

The companies that we’re sourcing these data science coding interview questions from today are Facebook, Postmates, Credit Karma, Ring Central, and Twitch. We’ll introduce the coding concept first to understand and then show you a real interview question we collected from the company.

Technical Concepts of Data Science Coding Interview Questions

This guide of technical concepts contains almost all types of data science coding interview questions you should expect when interviewing for a data scientist position.

From this list of data science coding interview questions, an interviewee should be able to prepare for:

  • The complex and tough questions
  • What answers will positively resonate with an employer
  • The different concepts that companies will test you on
  • Developing the confidence to ace the interview

We’ve broken these data science coding interview questions into five different concepts:

  1. Concept #1: Categorizations, aggregations, ratios
  2. Concept #2: LEFT JOINs and Subqueries and CTEs
  3. Concept #3: Subqueries in the WHERE clause
  4. Concept #4: Window functions
  5. Concept #5: Date manipulations

1. Categorizations, Aggregations, Ratios

Let’s start simple -- categorizations, aggregations, ratios. Aren’t these 3 different concepts? Yes, but they’re almost always tested together in one coding question. And it’s probably the most commonly tested concept on data science coding interviews, especially in the beginning rounds.

By categorizations, we mean CASE statements in SQL. We’ll stick with SQL technical concepts because most people use SQL for data science coding interviews.

Aggregations like taking the sum or average are always tested and usually are applied after the categorization takes place.

But they can also take the form of a ratio.

Here’s an example by Facebook

(https://platform.stratascratch.com/coding-question?id=2005&python=):

Return the share of monthly active users in the United States (US). Active users are the ones with an "open" status in the table.

Data Science Coding Interview Questions

Here’s the solution to this coding interview question:

Facebook Data Science Coding Questions

If you read the above solution and it makes sense to you then congrats, you’re on the right track.

To solve this question, you need to understand how to properly write a case statement, properly aggregate the results of the case statement, and then take a ratio from the result of the aggregation itself. All in one query.

The tricky part isn’t even those 3 technical concepts. It’s organizing your code into logical steps -- as you can see there’s a subquery to split out the logic.

And there’s also a data type conversion from an integer to a float so that your output can be between 0 and 1.

These are the little nuances you need to know in addition to these 3 concepts. Lots of people forget these small things, especially if you’re out of practice or nervous.

2. LEFT JOINs and Subqueries and CTEs

The second concept of data science coding interview questions includes JOINs, subqueries, and CTEs. Again 3 different concepts? But they’re always tested together to solve one problem.

Let's have an example. This question is from Credit Karma

(https://platform.stratascratch.com/coding-question?id=2003&python=)

Write a query that joins this submissions table to the loans table and returns the total loan balance on each user’s most recent ‘Refinance’ submission.

Data Science Coding Questions

And here’s the SQL code solution for this coding interview question:

Credit Karma data scientist coding interview questions

It might look pretty complicated. It even includes other advanced concepts we’re not even covering. But you can see there are a JOIN statement and a subquery (which could be re-written as a CTE, if you wanted to do that).

What’s being tested is if you can blend two sources of data together and aggregate the data from one table, “balance” with a variable from another table “user_id”. Remember, a successful interviewee will talk through the entire logic and why they’re doing what they’re doing.

3. Subqueries in the WHERE Clause

While we’re talking about subqueries, they can be found in multiple places – not just at the JOIN statement like in the last question. Subqueries in the WHERE clause are also common concepts tested and actually, it’s a question that most people get wrong.

Here’s a question from Ring Central

(https://platform.stratascratch.com/coding-question?id=2017&python=)

How many paid users had any calls in Apr 2020?

Ring Central data science coding interview questions

And here’s the solution:

Ring Central data science coding interview question solution

Subqueries in the WHERE clause allow you to quickly grab information from one table and use it as a filtering condition in the main query like we are seeing here. So in this example, we’re grabbing the user_id of paid users from the user table and using that as a condition to find how many calls they placed in the calls table.

4. Window Functions

The 4th coding concept tested on data science coding interviews so far in 2021 is window functions. Window functions are heavily utilized in data science work. We’ve talked about them extensively in the past articles - Types of Window Functions and Window Functions Questions. One example of this is a question from Twitch.

(https://platform.stratascratch.com/coding-question?id=2012&python=)

Twitch data science coding interview questions

Here’s the solution to this coding question:

Twitch data science coding interview question solution

This one’s insane! And yes it did get asked in a data science interview. But don’t worry, in the upcoming weeks, StrataScratch will be stepping through the solution for each of the coding interview questions we just covered.

These questions take a lot of time to solve and they’re typically found in the later rounds of your data science interviews because there is so much code and it just takes time to talk through it.

5. Date Manipulations

The last technical concept of data science coding interview questions is date manipulations. This one is always asked on data science coding interviews because everyone in analytics from an analyst to a data scientist works with dates. It would be weird not to. Because you’re usually given a complete day, year-month-day-timestamp, but asked to aggregate by weeks, months, or years, you’ll need to know how to manipulate dates.

Here’s one example from Postmates

(https://platform.stratascratch.com/coding-question?id=2014&python=)

Which hour of the day averages the highest order volume?

Postmates data science coding interview questions

And here’s the solution to this question:

Postmates data science coding interview question solution

This one requires you to extract the hour from the DateTime in order to find the highest order volume. So it’s just a matter of knowing these DateTime functions, like date_part().

Conclusion

So these are the 5 technical concepts that data science companies will most likely to test you on in your interviews in 2021. Did you think these data science coding interview questions were difficult to answer? If so, we, at StrataScratch, will step through all the solutions in the upcoming weeks.

Remember, these are coding interview questions that were asked in the first few weeks/months of 2021, so they’re definitely requiring you to master these 5 technical concepts.