The Essential 3 Data Science Projects to Land Your Dream Job
Here are 3 projects that will help land a data science job. These projects improve all five fundamental skills you’ll have to showcase in an interview.
What if you’re allowed to do only three data projects for your portfolio? Which ones would you choose to guarantee you get a job?
This article will answer the latter. All three projects we’ll discuss are mentioned in our video.
Before we explain every project, let’s talk about the technical skills hiring managers and companies look for in data science projects.
5 Fundamental Data Science Skills
These skills are fundamental as they will be tested at every respectable data science interview. And they are tested because you need them for your job.
The projects in your portfolio should showcase them, too.
Regarding Python, I more specifically mean the libraries most companies use, such as pandas, NumPy, scikit-learn, and TensorFlow. You will have to know how to use these libraries – write and read the code others have written. If you don’t have experience with Python, you can also use R or SAS. Now, notice that I didn’t mention SQL. It’s a must for data scientists, and most companies require it. However, you don't need it in a data science project.
Python is only a tool but one that will allow you to show your other four skills. You’ll use it in data wrangling (along with SQL), statistical analysis, machine learning (it’s hard to do it with R or SAS), and data visualization.
All five skills are a must for any data scientist. Use this checklist to get the maximum from your chosen data science projects. (Yes, you’re allowed to come up with or find your own three must-do projects.)
Of course, there's much more to data science, like big data technologies, deep learning, NLP, and cloud computing. But the need for these skills heavily depends on the job description, so they’re not always required to get a job.
Now, the projects!
Project 1: Insights From City Supply and Demand Data
The Insights From City Supply and Demand Data project appeared in the Uber interview.
It’s no wonder it did; for Uber, cities are hubs of demand and supply interactions. This project involves analyzing Uber's dataset to provide business insights on trips, demand for drivers, and more.
Do this project, and you'll need to showcase these three skills.
1. Exploratory Data Analysis (EDA):
- Filling in missing values
- Aggregating data
- Parsing time intervals
- Calculating percentages
- Calculating weighted averages
- Finding differences
2. Data visualization: You have to visualize the relationship between supply and demand.
3. Deriving actionable insights about completed trips:
- Analyze different time periods
- Calculate the weighted average ratio of trips per driver
- Draft a driver's schedule based on the busiest hours
- Understand the relationship between supply and demand
This is a data analysis project that tests your insight generation and data wrangling skills.
Project 2: Customer Churn Prediction
The Customer Churn Prediction is a project by Sony Research.
It’s a classification project where you’re given a dataset of a telecom company's customers.
You should approach this project following these major steps.
1. EDA, data visualization, and deriving insights:
- Check data fundamentals
- Choose the data you need and form a clean dataset
- Plot histograms to check the distribution of values
- Create a correlation matrix, check the features' importance, and draw conclusions about the positive and negative correlation between data
2. Split data into train and test sets: Use scikit-learn to split data using the 80/20 ratio.
3. Build and choose a churn prediction model:
- Use classical ML models
- Evaluate them using accuracy and F1 score
Bonus step for showing-off: Build a deep learning (DL) model using Artificial Neural Network (ANN).
Bonus step for making the interviewer completely flip out: Deploy a model into a cloud environment and discuss two major problems with the MLOps cycle:
- Data drift – When the model’s input distribution changes.
- Concept drift – When the functional relationship between the model inputs and outputs changes
Project 3: Predictive Policing – Examining the Implications
Predictive Policing is a project done by Orlando Torres. It uses the 2016 City of San Francisco crime data to predict the number of crime incidents in a given zip code on a certain day and time. It's a great project to showcase your supervised learning techniques and even sprinkle in some deep learning.
Here’s the approach for this project.
1. EDA and data wrangling:
- Select the variables
- Calculate the total number of crimes per year per zip code per hour
2. Split data into train and test sets: Split the dataset chronologically (done when a time component is involved) into training and test sets.
3. Build supervised and deep learning models:
- Linear regression
- Random forest
- K-nearest neighbors
- DL model Multilayer Perceptron
4. Build visualization: Create a visualization and an interactive map (e.g., in Tableau) for your project.
These three projects will help you showcase the most important data science skills companies are looking for.
If you hate these, there’s also a list of 30+ Project Ideas to Showcase Your Machine Learning Skills. Check them out! If these projects are too advanced for you, there are also 19 data science projects for beginners that will surely help you.