Data Science Probability Interview Questions

Data Science Probability Interview Questions


Top companies require top data scientists. The ones that are good at solving probability problems, which is what we’ll do here.

Probability is a part of statistics. But at the job interviews, this is often a completely separate topic. That’s why within the non-coding questions we have a category dedicated solely to the probability interview questions.

There are questions of different levels of complexity, but they all boil down to one basic concept: the probability definition.

What is a Probability?

A probability is defined as a ratio between the number of favorable and all possible outcomes or cases.

Mathematically, we can show it this way:

Probability = Number of Favorable CasesNumber of All Possible CasesProbability\ = \ \frac{Number\ of\ Favorable\ Cases}{Number\ of\ All\ Possible\ Cases}

Let’s have a look at how you can apply this to something practical. Namely, to solve the probability interview questions.

Data Science Probability Interview Questions and Answers

To follow what comes easier, you might want to have a look at our guide explaining probability and other statistics concepts you need as a data scientist.

Data Science Probability Interview Question #1: Two Cards Same Suit

It’s a question by Meta/Facebook asking you the following:

Data Science Probability Interview Questions from Facebook

Link to the question: https://platform.stratascratch.com/technical/2003-two-cards-same-suite

We’ll solve this probability interview question with the assumption that the cards are not replaced.

Intuitive Solution

Having in mind the probability definition, you can solve this data science probability interview question intuitively. There are four suits in the card deck. In every suit, there are 13 cards. That means there are 52 cards in total.

If you draw one card and have a look at its suit, the second card you draw has to be the same suit. Since you already drew one card, that means there are still 51 possible cases. There are also 12 favorable cases.

If you translate this to a formula, you get the following:

Favorable CasesAll Possible Cases = 1251 = 417\frac{Favorable\ Cases}{All\ Possible\ Cases}\ =\ \frac{12}{51}\ =\ \frac{4}{17}

Mathematical Solution

The number of all possible cases expressed mathematically is:

All Possible Cases = 52P2= 52 × 51= 2,652\begin{aligned} All\ Possible\ Cases\ &= \ {}^{52}P_{2} \\ &=\ 52\ \times\ 51 \\ &=\ 2,652 \end{aligned}

This is because any two cards can be drawn from a 52-card deck.

To calculate the favorable outcome, we have to choose which of the four suits we want to double up. Once we have determined the suit, we can draw two cards from the same suit. In other words:

Favorable Cases = 4 × 13P2= 4 × 13 × 12= 624\begin{aligned} Favorable\ Cases\ &= \ 4\ \times\ {}^{13}P_{2} \\ &=\ 4\ \times\ 13\ \times\ 12 \\ &=\ 624 \end{aligned}

Put this into the probability formula and you’ll get:

Probability = 6242,652= 417\begin{aligned} Probability\ =\ \frac{624}{2,652} \\ =\ \frac{4}{17} \end{aligned}

Data Science Probability Interview Question #2: Where Are the Birthday People?

This one’s a little bit more difficult question. It’s asked by Yammer:

Data Science Probability Interview Questions from Yammer

Link to the question: https://platform.stratascratch.com/technical/2028-where-are-the-birthday-people

Here we’re going to assume that there are 365 days in a year and k ≤ 365.

Unlike the previous question, where repetition was not possible, here the birthdays might fall on the same date.

For each person in the room, there are 365 possible scenarios. Therefore:

All Possible Cases = 365 × 365 × ... k times= 365k\begin{aligned} All\ Possible\ Cases\ &=\ 365\ \times\ 365\ \times\ ...\ k\ times \\ &=\ 365^k \end{aligned}

You would expect we’re going to find the favorable cases as a next step. We will, but not just yet. Since here it’s harder to find favorable cases than unfavorable ones, we’ll first find the unfavorable cases. From there on, it’ll be easy to find favorable cases.

Unfavorable cases mean none of the k number of people has the birthday on the same date. Let’s assume there are 365 seats in the room, each numbered with a calendar date:

1st Jan2nd Jan3rd Jan
29th Dec30th Dec31st Dec

Each person has to sit on a seat which will be numbered with their date of birth. The first person has 365 options. The second person can’t sit on the same seat as the first person, so there remain 364 options, and so on.

Mathematically this equals to:

All Possible Cases = 365 × 364 × ... k times= 365Pk\begin{aligned} All\ Possible\ Cases\ &=\ 365\ \times\ 364\ \times\ ...\ k\ times \\ &=\ {}^{365}P_k \end{aligned}

From it follows:

Favorable Cases = All Possible Cases  Unfavorable Cases= 365k  365Pk\begin{aligned} Favorable\ Cases\ &=\ All\ Possible\ Cases\ -\ Unfavorable\ Cases \\ &=\ 365^k\ -\ {}^{365}P_k \end{aligned}

Probability is, therefore:

Probability = 365k  365Pk365kProbability\ =\ \frac{365^k\ -\ {}^{365}P_k}{365^k}

Data Science Probability Interview Question #3: Two Out of Three Tails

The last probability interview question we’ll talk about is asked by Jane Street:

Data Science Probability Interview Questions from Jane Street

Link to the question: https://platform.stratascratch.com/technical/2285-two-out-of-three-tails

This question too can be answered intuitively and mathematically.

Intuitive Solution

You can quite easily visualize the scenarios. For example, if you flip four coins with two sides, the number of possible outcomes is:

All Possible Cases = 24 = 16All\ Possible\ Cases\ =\ 2^4\ =\ 16

The distribution can be shown in the following way:

Scenario# Cases
4 Heads1: HHHH
3 Heads, 1 Tail4: HHHT, HHTH, HTHH, THHH
2 Heads, 2 Tails6: HHTT, HTHT, HTTH, THHT, THTH, TTHH
1 Heads, 3 Tails4: TTTH, TTHT, THTT, HTTT
4 Tails1: TTTT

As you can see, there are really 16 possible outcomes. However, the question states there are at least two tails. That means that the first two scenarios are not possible. This leaves us with:

All Possible Cases = 11All\ Possible\ Cases\ =\ 11

Of the remaining cases, the only favorable ones are where we get one head and three tails. Therefore:

Favorable Cases = 4Favorable\ Cases\ =\ 4

Finally, the probability is:

Probability = 411Probability\ =\ \frac{4}{11}

Conclusion

The three questions we’ve covered give you a picture of what you can expect at the data science interviews when it comes to the probability interview questions. The solutions we provided are also an outline of how you should approach solving the data science probability interview questions.

This should be just a start for you to practice more of these questions. Solving many probability interview questions is the only way to increase, ahem, the probability of becoming a data scientist at the top company. These 30 probability and statistics interview questions that’ll give a good kick in that direction. You can always have some more in the non-coding questions section on our platform.

Data Science Probability Interview Questions


Become a data expert. Subscribe to our newsletter.