314 Resources For Data Science Fundamentals

Published: December 23, 2021

Categories:

Written by:
Abhishek Ramesh

A list of best resources for data science fundamentals.

This is a fundamentals guide to data science interviews. In this guide, common topics from coding and non-coding questions asked during data science interviews are covered!

The 7 categories covered are:

Python
SQL
Probability
Statistics
Modeling
Business Case
Product

The common topics asked are classified under their relevant subject. For example, Regression would be classified under the Statistics category. These common topics include YouTube videos or articles explaining the topic. Most topics also include interview questions from companies such as Google and Uber for practice.

While this guide provides common topics asked during data science interviews, this fundamentals guide is meant as a base page. You should further research on the topics mentioned.

For coding questions, it is all about practice. Practicing and understanding your own and other people’s code. Some people may have a solution that is more optimized than yours, which is why building a community is important. StrataScratch shows how other people have solved a question you attempted, where you can also learn more about how to better your solution!
For non-coding questions, you need to understand each step of applying a concept. The understanding of mathematical derivations behind a concept is important to truly answer any question about the relevant concept. Once you understand the math behind a concept, you need to learn how to effectively program these concepts. This comes back to practice and learning from others. It is important to understand the mathematical derivation of concepts. While only a handful of interview questions will ask mathematical knowledge behind certain topics, such as logistic regression, it is good to know when implementing complex models during your job as a future data scientist!

There are specific topics that are only asked by a handful of companies. To get a better idea of the type of potential topics companies may ask, you should go through the job requirements. However, some companies do not write their Data Science job descriptions when hiring. They may just copy descriptions from Glassdoor or might ask for someone who knows every single programming language and statistical models out there. When you notice job descriptions similar to this, you should be good with going through the topics mentioned in this guide.

It must be noted that not all topics include a practice question. Even if a topic does not include a practice question, you should understand the concept, since this may be asked during your interview or future data science job!

If you do not understand a topic, check out the StrataScratch blogs! There are multiple guides written to help interviewers prepare from SQL Window Functions to Collinearity. If there are topics that are not explained in the StrataScratch blogs, visit other websites to learn more about the topic! Take notes about these topics and go through more practice questions! It is recommended to have a document of topics you don’t fully understand or easily forget.

For example, if you easily forget how the ranking() function works in SQL take screenshots and notes to explain the concept.

Do keep in mind that medium/hard questions on the StrataScratch database will contain multiple concepts, since during actual data science interviews you will need to know a variety of functions/topics.

When a question asks you to explain a certain concept to a specific target audience, think in the eyes of the target audience when answering these types of questions. Remember to not use technical jargon that would confuse the audience, especially a non-technical audience. For example, this question asks how to explain regression to an 8 year old. Obviously an 8 year old would not understand MSE values or even a simple y=mx+b equation, so explain using everyday jargon.

Coding Questions

Python

Everyone who knows how to code nowadays has heard of Python. There is a reason Python is extremely popular among Data Analysts and Data Scientists. It has a variety of libraries and the ability to process data quickly. NumPy and Pandas are some of the most commonly used libraries for data analysis.

Understanding NumPy / Pandas
1. Numpy
  1. NumPy: the absolute basics for beginners
  2. NumPy Reference & Cheat Sheet
2. Pandas
  1. User Guide
Comparison Operators / Logical Operators / Mathematical Functions
1. Operators and Expressions in Python
Flow Control Functions
1. More Control Flow Tools
2. Errors and Exceptions
Conditional Expressions
1. Filtering by columns/rows
  1. 10 Ways to Filter Pandas Dataframes
2. Unique values - nunique() / drop duplicates
3. Null values
  1. Working with missing data
4. Casting data types
  1. Change the data type of a column or a Pandas Series
  2. Type Conversion in Python
5. Rounding values
  1. How to Round Numbers in Python
6. Sorting
  1. Pandas Sort: Your Guide to Sorting Data in Python
    
    Practice Questions
DataFrame Formatting
1. Grouping
  1. Learning
    1. Pandas GroupBy: Your Guide to Grouping Data in Python
    2. Largest Olympics
  2. Practice
    1. Reviews of Hotel Arena
    2. Year Over Year Churn
2. Merges
  1. Learning
    1. All the Pandas merge() you should know for combining datasets
  2. Practice
Ranking Methods
1. Learning
  1. pandas.DataFrame.rank
2. Practice
Dict Methods
1. Learning
  1. Python Dictionary(Dict): Update, Cmp, Len, Sort, Copy, Items, str Example
2. Practice
  1. Player with Longest Streak
Array Operations
1. Learning
  1. Python Arrays
2. Practice
  1. Ranking Most Active Guests
String Functions
1. Learning
  1. pandas.Series.str.contains
  2. Python String Methods
2. Practice
List Methods
1. Learning
  1. Python List Functions & Methods
  2. Data Structures
Lambda function
1. Learning
  1. How to Use Python Lambda Functions
2. Practice
Class methods/Set Methods
1. Python's Instance, Class, and Static Methods Demystified

SQL

Even though Python and Pandas has the ability to process databases, Pandas takes much longer and sometimes can not handle larger databases. In these cases, Structured Query Language (SQL) is highly preferred. SQL has simpler code while having the ability to filter and restructure databases through queries.

In the following sections, there will be multiple links to learning and practicing specific concepts, a free one-stop for SQL tutorial is https://mode.com/sql-tutorial. However, it is recommended to go through YouTube SQL tutorials as well, especially if it is your first time learning, so you can learn tips and tricks when coding.

Functions

Where / Sorting / Having
1. Learning
2. Practice
Limit / Offset
1. Learning
  1. SQL: SELECT LIMIT Statement
2. Practice
  1. Find the most common grade earned by bakeries
Date Time Functions
1. Learning
2. Practice
  1. Customer Revenue In March
  2. Growth of Airbnb
Distinct Clause
1. Learning
  1. SQL SELECT DISTINCT Statement
2. Practice
  1. Number Of Unique Facilities And Inspections Per Municipality
  2. Customer Orders and Details
Aggregate Functions (Group By, Case)
1. Learning
  1. SQL GROUP BY Statement
  2. SQL Case Statements For Data Science Interviews in 2021
2. Practice
  1. Find the postal code which has the highest average inspection score
  2. Host Popularity Rental Prices
Combining (Joins / Unions)
1. Learning
2. Practice
Subquery Expressions
1. Learning
  1. SQL Server Subquery
2. Practice
  1. Highest Priced Wine In The US
  2. Inspection Scores For Businesses
Window Functions (Partition by, Rank, Ntile, Lag/Lead, Common Table Expression)
1. Learning
2. Practice
Pattern Matching / Text Searching
1. Learning
  1. LIKE and ILIKE for Pattern Matching in PostgreSQL
2. Practice
  1. Classify Business Type
  2. Counting Instances in Text
Array Functions
1. Learning
  1. Working with arrays
2. Practice

Non-Coding Questions

Probability

Understanding mathematical concepts is an important part of being a successful data scientist. Probability provides an important foundation for concepts such as Bayes Theorem and distributions.

Axioms of Probability
1. Learning
  1. Axioms of Probability
Permutations/Combinations
1. Learning
  1. How to Use Permutations and Combinations
  2. Permutations, Combinations & Probability (14 Word Problems)
2. Practice
  1. Pair by Drawing 2 Cards
  2. HHT Probability
Multiplication Rule
1. Learning
  1. Multiplication & Addition Rule - Probability - Mutually Exclusive & Independent Events
Conditional Probability
1. Learning
  1. Conditional Probability With Venn Diagrams & Contingency Tables
2. Practice
  1. 3 Heads Probability
Independent Events
1. Learning
  1. Independent Events (Basics of Probability: Independence of Two Events)
2. Practice
  1. Even Heads
Bayes Theorem
1. Learning
  1. Bayes' Theorem and Cancer Screening
2. Practice
  1. Two Boys Odds
Different distributions
1. Probability Density Function and Cumulative Density Function
  1. Probability Density Functions - PDF
  2. Cumulative Distribution Functions - CMF
  3. Finding Percentiles
  4. Special Expectations
    
    Continuous
2. Normal Distribution (aka Gaussian Distribution)
  1. Learning
    1. Normal Distribution
    2. Normal Distribution & Probability Problems
  2. Practice
    1. Non-Gaussian Distribution
    2. Expectation Of A Gaussian
3. Uniform Distribution
  1. Learning
    1. Gallery of Distributions
    2. Continuous Probability Uniform Distribution Problems
  2. Practice
    1. Larger Expected Value
4. t Distribution (aka Student’s t distribution)
  1. Learning
    1. t Distribution
    2. Student's T Distribution - Confidence Intervals & Margin of Error
5. F Distribution
  1. Learning
    1. F Distribution
    2. Lesson 1 - What is the F-Distribution in Statistics?
6. Chi-Squared Distribution
  1. Learning
    1. Chi-Square Distribution
    2. Chi Square Test
7. Exponential Distribution
  1. Learning
    1. Exponential Distribution
    2. Probability Exponential Distribution Problems
8. Lognormal Distribution
  1. Learning
    1. Lognormal Distribution
      
      Discrete
9. Binomial Distribution
10. Poisson Distribution
  1. Poisson Distribution
  2. Poisson Distribution EXPLAINED!
Series: Geometric - Hypergeometric - Arithmetic - Summation to Infinity
1. Learning
  1. Geometric Distribution - Probability, Mean, Variance, & Standard Deviation
  2. The Hypergeometric Distribution: An Introduction (fast version)
2. Practice
  1. Matching Pairs Attempts
Expected Value
1. Learning
  1. Expected Value and Variance of Discrete Random Variables
2. Practice
  1. Roulette Expectations
  2. Expectation Of Sum Of Dices
Binomial Distribution - Negative Binomial
1. Learning
  1. An Introduction to the Binomial Distribution
2. Practice
  1. Throwing Dice
  2. What is More Likely

Statistics

Data Science can be summed up as Computational Statistics. From predicting what shows are recommended to you on Netflix (Collaborative filtering) to predicting the demand of iPhones next year (Regression), statistics is the basis of Data Scientists.

Intro to Stats (Mean, Median, Mode, Range, Standard Deviation, Graphs)
1. Learning
  1. Mode, Median, Mean, Range, and Standard Deviation (1.3)
  2. Bar Charts, Pie Charts, Histograms, Stemplots, Timeplots (1.2)
2. Practice
Boxplot - IQR
1. Learning
  1. How To Make Box and Whisker Plots
2. Practice
  1. New Observation is Outlier
Variance → ANOVA
1. Learning
  1. Variance - How To Calculate Variance
  2. ANOVA - How To Calculate and Understand Analysis of Variance (ANOVA) F Test.
2. Practice
  1. Expectation of Variance
  2. Variance in Unsupervised Model
Z-test --- T-test
1. Learning
2. Practice
  1. Sample Size
Central Limit Theorem
1. Learning
  1. Introduction to the Central Limit Theorem
Confidence Interval
1. Learning
2. Practice
  1. Confidence Interval
  2. Margin of Error
Hypothesis testing -- P-Value
1. Learning
  1. Hypothesis testing and p-values | Inferential statistics | Probability and Statistics | Khan Academy
  2. Hypothesis Testing
2. Practice
  1. P-value
  2. Interpret P-value
Confusion matrix (Sensitivity and specificity)
1. Learning
2. Practice
  1. Precision and Recall
  2. False Positives or False Negatives
A/B testing
1. Learning
2. Practice
  1. Certain Factor Predicts Certain Outcome
  2. Random Bucketing
Polar Coordinates
1. Learning
  1. Polar Coordinates Basic Introduction, Conversion to Rectangular, How to Plot Points, Negative R Valu
  2. An Introduction to Polar Coordinates
2. Practice
  1. Circle in Polar Coordinates
Correlation coefficient (aka Pearson's correlation coefficient)
1. Learning
  1. Correlation Coefficient
  2. Correlation and regression
2. Practice
  1. Pearson's Correlation Coefficient
Bias-Variance Tradeoff
1. Learning
2. Practice
  1. Bias-Variance Tradeoff
Error Predictions (MSE, RMSE, MAE, R^2)
1. Learning
2. Practice
  1. StrataScratch Data Science Questions - R^2 Value
  2. StrataScratch Data Science Questions - Negative R Squared
Regression
1. Learning
  1. OLS - Introduction to residuals and least squares regression
  2. Ridge - Regularization Part 1: Ridge (L2) Regression
  3. Lasso - Regularization Part 2: Lasso (L1) Regression
  4. Elastic-Net - Regularization Part 3: Elastic Net Regression
  5. Logistic Regression - StatQuest: Logistic Regression
  6. Regularization: Ridge, Lasso and Elastic Net
  7. Logistic vs Bayesian Logistic
2. Practice
  1. OLS Assumptions
F-statistic
1. Learning

Modeling

Modeling is the application of statistical concepts and frameworks in everyday scenarios. Before attempting these questions make sure you have a thorough understanding of the concepts in the Statistics section.

Not all modeling questions will explicitly mention a statistical concept to use as part of your answer. Interviewers test your understanding of the questions and what techniques you will use to solve the question. Remember that interviewers don’t always look for the most accurate solution, but want to see you have a solid understanding about the question and how to go about solving it. Remember before attempting these questions during an actual interview, ask clarifying questions such as ambiguous terminology to the interviewer.

Generalized

The following questions test your understanding of when to use which statistical analysis and your overall approach to an everyday problem. For example, a modeling question asked by Amazon was to predict whether a customer will buy something today or not based on their information. Definitely practice these before your interviews!

Business Case

Business case questions are a tricker type of questions. These questions can not be split into basic topics to learn. These questions are split into 3 topics: applied data, sizing, theory testing. These questions mainly test your understanding of the company’s products, economy and business competitions.

These are the types of questions to practice multiple times. If you want to get even better, research the company’s products before the interview, so you can show the company you’re interested in them.

Fortunately, we have written a guide to solving data science business case questions to learn more about how to improve on answering business questions.

Product

Product questions, similar to business case questions, can not be split into topics to learn from scratch. These questions are split into 3 topics: metric related problems, measuring impact of a new product/feature, and designing products. These are questions to practice repeatedly. Fortunately, we have written an ultimate guide to solving data science product questions. To learn more about how to improve on solving product questions, check out the ultimate guide to product data science interview questions.

314 Resources For Data Science Fundamentals

Coding Questions

Python

SQL

Functions

Non-Coding Questions

Probability

Statistics

Modeling

Structuring Data

Modeling Data

Neural Networks

Generalized

Business Case

Product

Latest Posts:

Unsupervised Clustering: Methods, Examples, and When to Use

A Guide to Master Machine Learning Modeling from Scratch

How to Create a Bubble Plot with Python and Matplotlib?