The Essential 3 Data Science Projects to Land Your Dream Job

5 Fundamental Data Science Skills

These skills are fundamental as they will be tested at every respectable data science interview. And they are tested because you need them for your job.

The projects in your portfolio should showcase them, too.

Regarding Python, I more specifically mean the libraries most companies use, such as pandas, NumPy, scikit-learn, and TensorFlow. You will have to know how to use these libraries – write and read the code others have written. If you don’t have experience with Python, you can also use R or SAS. Now, notice that I didn’t mention SQL. It’s a must for data scientists, and most companies require it. However, you don't need it in a data science project.

Python is only a tool but one that will allow you to show your other four skills. You’ll use it in data wrangling (along with SQL), statistical analysis, machine learning (it’s hard to do it with R or SAS), and data visualization.

All five skills are a must for any data scientist. Use this checklist to get the maximum from your chosen data science projects. (Yes, you’re allowed to come up with or find your own three must-do projects.)

Of course, there's much more to data science, like big data technologies, deep learning, NLP, and cloud computing. But the need for these skills heavily depends on the job description, so they’re not always required to get a job.

Now, the projects!

Project 1: Insights From City Supply and Demand Data

The Insights From City Supply and Demand Data project appeared in the Uber interview.

It’s no wonder it did; for Uber, cities are hubs of demand and supply interactions. This project involves analyzing Uber's dataset to provide business insights on trips, demand for drivers, and more.

Do this project, and you'll need to showcase these three skills.

1. Exploratory Data Analysis (EDA):

Filling in missing values
Aggregating data
Parsing time intervals
Calculating percentages
Calculating weighted averages
Finding differences

2. Data visualization: You have to visualize the relationship between supply and demand.

3. Deriving actionable insights about completed trips:

Analyze different time periods
Calculate the weighted average ratio of trips per driver
Draft a driver's schedule based on the busiest hours
Understand the relationship between supply and demand

This is a data analysis project that tests your insight generation and data wrangling skills.

Project 2: Customer Churn Prediction

The Customer Churn Prediction is a project by Sony Research.

It’s a classification project where you’re given a dataset of a telecom company's customers.

You should approach this project following these major steps.

1. EDA, data visualization, and deriving insights:

Check data fundamentals
Choose the data you need and form a clean dataset
Plot histograms to check the distribution of values
Create a correlation matrix, check the features' importance, and draw conclusions about the positive and negative correlation between data

2. Split data into train and test sets: Use scikit-learn to split data using the 80/20 ratio.

3. Build and choose a churn prediction model:

Use classical ML models
Evaluate them using accuracy and F1 score

Bonus step for showing-off: Build a deep learning (DL) model using Artificial Neural Network (ANN).

Bonus step for making the interviewer completely flip out: Deploy a model into a cloud environment and discuss two major problems with the MLOps cycle:

Data drift – When the model’s input distribution changes.
Concept drift – When the functional relationship between the model inputs and outputs changes

Project 3: Predictive Policing – Examining the Implications

Predictive Policing is a project done by Orlando Torres. It uses the 2016 City of San Francisco crime data to predict the number of crime incidents in a given zip code on a certain day and time. It's a great project to showcase your supervised learning techniques and even sprinkle in some deep learning.

Here’s the approach for this project.

1. EDA and data wrangling:

Select the variables
Calculate the total number of crimes per year per zip code per hour

2. Split data into train and test sets: Split the dataset chronologically (done when a time component is involved) into training and test sets.

3. Build supervised and deep learning models:

Linear regression
Random forest
K-nearest neighbors
XGBoost
DL model Multilayer Perceptron

4. Build visualization: Create a visualization and an interactive map (e.g., in Tableau) for your project.

Conclusion

These three projects will help you showcase the most important data science skills companies are looking for.

If you hate these, there’s also a list of 30+ Project Ideas to Showcase Your Machine Learning Skills. Check them out! If these projects are too advanced for you, there are also 19 data science projects for beginners that will surely help you.

The Essential 3 Data Science Projects to Land Your Dream Job

Categories

5 Fundamental Data Science Skills

Project 1: Insights From City Supply and Demand Data

Project 2: Customer Churn Prediction

Project 3: Predictive Policing – Examining the Implications

Conclusion

Latest Posts:

From Arrays to Lists: Converting Numpy Arrays with Python

Tips for Crafting an Impressive Data Analyst Portfolio

PySpark GroupBy Guide: Super Simple Way to Group Data

Share

Follow

Categories