Microsoft Data Science Interview Questions

Published: October 19, 2021

Categories:

Written by:
Nathan Rosidi

We will analyze Microsoft data science interview questions and prepare you with a few of the concepts that frequently arise in interviews.

Microsoft is synonymous with tech. A tech industry behemoth for more than three decades, Microsoft is a leader in software, consumer electronics, and data. Microsoft’s Windows is the world’s most popular operating system, and their Office suite is used in business, education, and personal projects around the globe. In February 2010, Microsoft launched Azure (today referred to as Microsoft Azure, then called Windows Azure) and in the years since has routinely optimized the SaaS (Software as a Service), PaaS (Platform as a Service), and the IaaS (Infrastructure as a Service) elements of the Azure environment. For example, Azure has simplified popular tasks in machine learning to a GUI ‘drag-and-drop’ interface where familiarity with general concepts is sufficient to train, test, and deploy machine learning models.

Microsoft hires data scientists for a plethora of teams including Xbox Gaming Data Services (GDS), Azure data science teams, and Customer Success Engineering. Data Scientists at Microsoft solve problems ranging from improving customer experience in their applications to developing recommendation models to suggest relevant content to product users. Microsoft attracts hundreds of thousands of applicants every year, and we at StrataScratch know it’s important for applicants to understand the current questions asked in Microsoft interviews for open data scientist positions.

This article covers some data science interview questions from Microsoft. We will analyze these data science interview questions and guide you towards solutions, while preparing you with a few of the concepts that frequently arise in interviews.

Technical Concepts Tested in Microsoft Data Scientist and Data Analyst Interview Questions

To best prepare for a technical Microsoft data science interview, you should be familiar with the frequently tested concepts from interviews. For Microsoft SQL Interview Questions, this includes joins, sub-queries, groupings, and basic SQL calculations. Knowledge of how these interact improves the likelihood that candidates are able to successfully answer interview questions.

The table below shows commonly tested concepts in Microsoft data science interview questions:

Sub-queries	Min/Max/Average
Joins	Count
Grouping by / Ordering by	Distinct
Case/Else	Rank

The tested concepts manipulate data or calculate insights based on determined criteria, and sometimes you’ll be asked to do both in the same query. For example, you may be asked to calculate the share of users that subscribe to more than one Microsoft service, and to output certain data in the result. To answer these problems, you are required to understand technical concepts including sub-queries, joins, group by, count, distinct, etc.

Next, we will review one of the Microsoft data science interview questions and examine how we approach and implement SQL solutions.

Microsoft Data Science Interview Question

Exclusive Users Per Client

Microsoft Data Science Interview Questions for Users Exclusive Per Client

Link to the question: https://platform.stratascratch.com/coding/2025-users-exclusive-per-client

Write a query that returns a number of users that exclusive to only one client. Output the client ID and a number of exclusive users.

Setting Yourself up for Success

First, restate the question back to the interviewer. This provides an opportunity to confirm that you understand the problem. For this Microsoft data science interview question, we’d say something like “To confirm, we want to return the client ID and number of exclusive users for users working on only one client. Is this correct?”. Additionally, this is a great time to ask any clarifying questions if anything about the dataset is unclear. For example, the word ‘client’ could refer to a particular end user account or a method of accessing (i.e. mobile, desktop, etc.). In this case it refers to the latter. It is also possible that the interviewer’s response may help point you in the right direction. Take advantage of this extra time and opportunity to improve your understanding of the question.

Next, let’s take a look at the schema. We will do this without previewing the table as you often won’t have access to populated observations during the interview. Because of this, you will need to make assumptions about how the dataset behaves. Again, don’t be afraid to ask questions. Specifying details regarding duplicates in the dataset for example may considerably affect the desired result.

Table Schema

fact_events

id	int
time_id	datetime
user_id	varchar
customer_id	varchar
client_id	varchar
event_type	varchar
event_id	int

In the provided table schema:

Our ‘id’ and ‘event_id’ columns are integers
Variable-sized Strings (varchar) in the ‘user_id’, ‘customer_id’, client_id, and ‘event_type’ columns
Our ‘time_id’ column is datetime

First, it is helpful to clarify what the ‘id’ column is supposed to be as we already have id’s for time, user, client, and event. After looking at the sample data (could also be asked of the interviewer), ‘id’ appears to be an identifier that guarantees that each observation is unique. Redundant ‘id’ values may indicate duplicate data.

Next, let’s consider the ‘time_id’ column. As you may expect, ‘time_id’ stores a timestamp in ‘yyyy-mm-dd’ format. There is no hour, minute, or second data, but that isn’t relevant to our problem as we aren’t asked to do anything involving the date. Part of knowing how to answer problems is knowing what to ignore, and here we can ignore this for now.

The ‘user_id’ column stores a user ID-key structured as 4 numerical values separated by a dash and followed by 5 letters. Values may repeat in this column.

The ‘customer_id’ column contains the names of customers stored as text. Values include ‘Sendit’, ‘Zoomit’, and ‘Connectix’.

‘client_id’ refers to whether the communication was made via mobile or desktop.

The ‘event_type’ column describes the type of communication. For example ‘message sent, ‘file received’, and ‘voice call started’.

The last column is ‘event_id’ as an integer. These values range from 1 to 9.

Our interview problem only involves the ‘user_id’ and ‘client_id’ columns, so we will be working with those.

Logic

It may be helpful to write out your logic in order to solve interview problems.

First, we know that we’re interested in client_id and user_id’s as outputs, and we will need to select those columns
We only want to include user_id’s where the count of distinct client_ids is equal to one
We want to show these results relationship to client_id

After you’ve written out your logic, ask your interviewer if they agree with your approach. They may highlight cases which you haven’t considered. Pay close attention to what they say because they’re offering you valuable information here. Once there is a consensus that your approach is correct, it’s time to code.

Approach and Solution

It’s important to stay calm during the interview process in order to think clearly. This is a fairly straightforward query, and we’ve sufficiently covered the logic of the solution. All that’s left to do is to program.

While there are many possible solutions to this Microsoft data science interview question, the solution below is very easily read and answers the problem concisely.

We begin by selecting the relevant fields, ‘client_id’ and ‘user_id’. As we’re only interested in the number of users, the number of distinct ‘user_id’ values is counted.

Next, let’s resolve the sub-query. We are interested in ‘user_id’ instances where the count of clients used is equal to one. This is accomplished by selecting ‘user_id’ from the ‘fact_events’ table we’ve been using, grouping on ‘user_id’, and using the ‘having’ function to filter our result to only include ‘user_id’ where the count of clients is one.

(SELECT user_id
 FROM fact_events
 Group By user_id
 HAVING count(DISTINCT client_id) = 1)

Last, we add this to our existing query. Due to the fact that we want to see our result along with its relationship to clients, we group the result on ‘client_id’.

SELECT client_id,
       count(DISTINCT user_id)
FROM fact_events
WHERE user_id in
    (SELECT user_id
     FROM fact_events
     Group By user_id
     Having count(DISTINCT client_id) = 1)
GROUP BY client_id

Check out our previous article Microsoft Data Scientist Interview Questions to find more interview questions from Microsoft.

Microsoft Data Science Non-coding Interview Questions

In addition to data science coding interview questions like the one above, candidates may also be asked conceptual questions to test their knowledge of general analytical principles. For example, the question below from September 2021 asked candidates to describe a method to determine the reason behind a 4% decrease in Skype’s daily active users. While there are many possible answers to this data science interview question, interviewers are ultimately testing the applicant’s logical reasoning.

Microsoft Data Science Interview Questions for Skype Usage

Link to the question: https://platform.stratascratch.com/technical/2321-skype-usage

Conclusions

This article covered recent Microsoft data science interview questions, but the same approach applies to all data science interview questions.

After receiving your question:

Reiterate the question to ensure that you fully understand what is being asked.
Discuss assumptions to clarify any uncertainties and to gather additional information about the potential solution
Walkthrough your problem-solving logic prior to coding your solution
Explain while you code. An interviewer may help you if you make a simple mistake while you explain!

The secret to better interview performance is practice and repetition. Through StrataScratch’s easy-to-use platform and historical interview questions from top employers including Microsoft, Amazon, and Spotify, candidates are able to prepare using real-world interview questions. New questions are routinely added, and can be solved in Python or SQL.

Microsoft Data Science Interview Questions

Technical Concepts Tested in Microsoft Data Scientist and Data Analyst Interview Questions

Microsoft Data Science Interview Question

Microsoft Data Science Non-coding Interview Questions

Conclusions

Latest Posts:

Learn PySpark Joins Easily with This Guide

Unsupervised Clustering: Methods, Examples, and When to Use

A Guide to Master Machine Learning Modeling from Scratch