Data Science Coding Interview Questions with 5 Technical Concepts
Understanding 5 coding concepts of data science coding interview questions that companies will test you on and preparing for them effectively
A data science interview involves multiple rounds and one of such rounds involves coding interview questions. The purpose of these data science coding interview questions is to check if a candidate can program and knows the required coding languages such as SQL and Python. We’ll cover the coding questions you may receive during your data science technical interview round.
In this article, we'll cover the 5 coding concepts that companies will test you on in your data science interviews in 2021. As you guys know, we’re always tracking and collecting data science coding interview questions from various companies to better understand what technical concepts you need to know to succeed in interviews and what skills you need to have to succeed on the job. So these are coding questions that I’ve collected in the first few months of 2021. So they’re as real-time as can be.
The companies that we’re sourcing these data science coding interview questions from today are Facebook, Postmates, Credit Karma, Ring Central, and Twitch. We’ll introduce the coding concept first to understand and then show you a real interview question we collected from the company.
Technical Concepts of Data Science Coding Interview Questions
This guide of technical concepts contains almost all types of data science coding interview questions you should expect when interviewing for a data scientist position.
From this list of data science coding interview questions, an interviewee should be able to prepare for:
- The complex and tough questions
- What answers will positively resonate with an employer
- The different concepts that companies will test you on
- Developing the confidence to ace the interview
We’ve broken these data science coding interview questions into five different concepts:
- Concept #1: Categorizations, aggregations, ratios
- Concept #2: LEFT JOINs and Subqueries and CTEs
- Concept #3: Subqueries in the WHERE clause
- Concept #4: Window functions
- Concept #5: Date manipulations
1. Categorizations, Aggregations, Ratios
Let’s start simple -- categorizations, aggregations, ratios. Aren’t these 3 different concepts? Yes, but they’re almost always tested together in one coding question. And it’s probably the most commonly tested concept on data science coding interviews, especially in the beginning rounds.
By categorizations, we mean CASE statements in SQL. We’ll stick with SQL technical concepts because most people use SQL for data science coding interviews.
Aggregations like taking the sum or average are always tested and usually are applied after the categorization takes place.
But they can also take the form of a ratio.
Here’s an example by Facebook
Return the share of monthly active users in the United States (US). Active users are the ones with an "open" status in the table.
Here’s the solution to this coding interview question:
If you read the above solution and it makes sense to you then congrats, you’re on the right track.
To solve this question, you need to understand how to properly write a case statement, properly aggregate the results of the case statement, and then take a ratio from the result of the aggregation itself. All in one query.
The tricky part isn’t even those 3 technical concepts. It’s organizing your code into logical steps -- as you can see there’s a subquery to split out the logic.
And there’s also a data type conversion from an integer to a float so that your output can be between 0 and 1.
These are the little nuances you need to know in addition to these 3 concepts. Lots of people forget these small things, especially if you’re out of practice or nervous.
2. LEFT JOINs and Subqueries and CTEs
The second concept of data science coding interview questions includes JOINs, subqueries, and CTEs. Again 3 different concepts? But they’re always tested together to solve one problem.
Let's have an example. This question is from Credit Karma
Write a query that joins this submissions table to the loans table and returns the total loan balance on each user’s most recent ‘Refinance’ submission.
And here’s the SQL code solution for this coding interview question:
It might look pretty complicated. It even includes other advanced concepts we’re not even covering. But you can see there are a JOIN statement and a subquery (which could be re-written as a CTE, if you wanted to do that).
What’s being tested is if you can blend two sources of data together and aggregate the data from one table, “balance” with a variable from another table “user_id”. Remember, a successful interviewee will talk through the entire logic and why they’re doing what they’re doing.
3. Subqueries in the WHERE Clause
While we’re talking about subqueries, they can be found in multiple places – not just at the JOIN statement like in the last question. Subqueries in the WHERE clause are also common concepts tested and actually, it’s a question that most people get wrong.
Here’s a question from Ring Central
How many paid users had any calls in Apr 2020?
And here’s the solution:
Subqueries in the WHERE clause allow you to quickly grab information from one table and use it as a filtering condition in the main query like we are seeing here. So in this example, we’re grabbing the user_id of paid users from the user table and using that as a condition to find how many calls they placed in the calls table.
4. Window Functions
The 4th coding concept tested on data science coding interviews so far in 2021 is window functions. Window functions are heavily utilized in data science work. We’ve talked about them extensively in the past articles - Types of Window Functions and Window Functions Interview Questions. One example of this is a question from Twitch.
Here’s the solution to this coding question:
This one’s insane! And yes it did get asked in a data science interview. But don’t worry, in the upcoming weeks, StrataScratch will be stepping through the solution for each of the coding interview questions we just covered.
These questions take a lot of time to solve and they’re typically found in the later rounds of your data science interviews because there is so much code and it just takes time to talk through it.
5. Date Manipulations
The last technical concept of data science coding interview questions is date manipulations. This one is always asked on data science coding interviews because everyone in analytics from an analyst to a data scientist works with dates. It would be weird not to. Because you’re usually given a complete day, year-month-day-timestamp, but asked to aggregate by weeks, months, or years, you’ll need to know how to manipulate dates.
Here’s one example from Postmates
Which hour of the day averages the highest order volume?
And here’s the solution to this question:
This one requires you to extract the hour from the DateTime in order to find the highest order volume. So it’s just a matter of knowing these DateTime functions, like date_part().
So these are the 5 technical concepts that companies will most likely to test you on in your interviews in 2021. Did you think these data science coding interview questions were difficult to answer? If so, we, at StrataScratch, will step through all the solutions in the upcoming weeks.
Remember, these are coding interview questions that were asked in the first few weeks/months of 2021, so they’re definitely requiring you to master these 5 technical concepts.