Data Analytics Project Ideas That Will Get You The Job
Data analytics project ideas that can boost your portfolio and help you land a data science job.
The best way to get a job in data science is to showcase your skills with a portfolio of data analytics projects. Data analytics projects not only help you in getting your first job but also help you to gain more exposure to data science. Some helpful projects will upskill you as well as make your resume more impressive.
In this article, we’ll talk about the one data analytics project idea that you need. The only project you need to build, that’ll help you gain full-stack data science experience, and impress interviewers on your interviews if your goal is to jumpstart your career in data science. We’ll break down the components of what a good data science project includes and exactly what an interviewer is looking for in a project and why they’re looking for it.
Data Analytics Project Ideas that You Need to Stay Away From
One piece of advice before we start talking about the components of a good project – There are two things you need to stay away from when you are trying to find or build a data analytics project.
1. Avoid any analysis with the Titanic or Iris dataset.
- It’s been done to death.
- Your employer doesn’t care about your survival classifier.
- It’s a great place to start when you’re just beginning. But it’s too commonplace and ordinary. So unless you can get ranked in the top 10, just move on from Kaggle.
We suggest not to include common projects in your resume or portfolio. You need to stay away from the most common data analytics project ideas.
Components of a Good Data Analytics Project that can Impress Anyone
To understand this one and only data analytics project idea, let's break down the components of exactly what an interviewer is looking for in a data science project and why they’re looking for it.
What an interviewer looks for is a data scientist with real-world skills -- both in analytics/coding and in using modern technologies. This helps you get closer to becoming a full-stack (or fully independent) data scientist.
A quick break down of the components of a good data analytics project:
- Working with real data
- Working with modern technologies
- Databases in the cloud
- Building models
- Making an impact / validation
- Application frameworks
1. Working with Real Data
Working with real data refers specifically to the data that gets updated in real-time. Working with real data that users produce and working with data that is produced in real-time helps prove to the interviewer that you know how to work with relevant and timely data. Not analyzing some data that was produced in 1912, like the titanic dataset.
So having said that, you’re probably asking, how do you get this data? This is a perfect segway into component #2.
2. Working with Modern Technologies
How are you going to get that real-life data that is updated in real-time?
You can use APIs to collect that data. Almost all apps and platforms these days rely on APIs to collect and pass information. Learning how to use APIs to get the data that you need for your analysis shows the interviewer that you have relevant skills to do the job.
Some popular examples of APIs are Twitter, Netflix, and Amazon. A good API for data analysis will include:
- Real-time updates
- Date and timestamps for each record
- Numbers and text for data
Other API examples can be:
The skills you’re trying to learn when working with APIs are to:
- Learn how to set up and configure APIs in your code (for example, dealing with API tokens)
- Learn to use libraries, like python libraries, that help you make API calls
- How to work with data structures like JSON and dictionaries to help you collect the data
This is something you’d be using at the job often so as an interviewer, I’d start to see you as an experienced data scientist, not one that’s an absolute beginner.
Databases on the cloud
‘Databases in the cloud’ is the second modern technology. Once you collect the data from the API and maybe after you clean the data, you probably want to store it in a database. Why?
- Because as it’s mentioned before, the data you pulled from the API is updated continuously, so if you pull the data again, you’ll get new records. So instead of pulling the entire dataset and cleaning all of it again, it’s nice to just pull and clean only the new records and store the old ones in a database when they’re safe.
- Every company uses databases and many use cloud services like AWS and Google Cloud. Knowing how to build a data pipeline with a cloud provider is a real skillset to have and will definitely set you apart from others. If you have this experience, your interviewer would be very impressed because the interviewer knows that you can hit the ground running from day 1 on the job.
3. Building Models
This gets us to the part of a data analytics project you probably thought was most important -- building models. It’s important to learn how to implement a model -- whether regression or some type of machine learning model. And that’s why you’ve been told to start with Kaggle because they could give you experience on how to build ML models. So if you just don’t have a lot of experience building models in general, you can start with Kaggle.
While getting experience in building models is important, there’s another aspect that’s even more important -- It’s the decisions you make and why you made them while building your model that is even more important.
Here are some questions you’ll need to answer when implementing your model. You’ll need to be able to eloquently explain your answers to these questions in an interview, otherwise no matter how good your model is, no one would be able to trust it:
- Why did you pick your model? Why that particular model? What are you trying to accomplish with this model that you couldn’t do with others?
- How did you clean the data? Why did you clean it that way?
- What type of validation tests did you perform on the data to prepare it for the model?
- Tell me about the assumptions of your model? How did you validate them?
- How did you optimize your model? What were the trade-off decisions you made?
- How did you implement your test/control?
- Tell me about how the underlying math in the model works.
What you don’t see in this line of questions is how your model performed. Your interviewer doesn’t care too much about that. Your interviewer care about your thought process and how you made decisions. And if you understand the underlying theory of the model.
4. Making an Impact / Getting Validation
Lastly, how do you know if you’ve built a great project? Your project should make an impact. You should have some validation from others.
You’re building and coding to improve your skills. But the job of a data scientist is to help others by turning data into insights to provide recommendations that make an impact on the business. How do you know if your insights and recommendations are valuable if you’re building by yourself and showing nobody? You need to show your work to others and build something they find valuable.
There are 3 ways to do this.
- The easiest way is to share your code with others in various data science communities like Reddit r/datascience or r/machinelearning. You can put your code in a git repo and share it that way. It’s a low effort lift that might not get the best engagement with the community.
- Another better way is to output your insights in the form of visuals and graphs-- build nice looking graphs that are interesting to look at. Share your graphs and write up your interpretation as some sort of blog article. You can get instant feedback if you share your insights on the data science publications like Towards Data Science on Medium and various subreddits like r/dataisbeautiful.
- And lastly, the hardest way is learning an application framework like Django or Flask. Deploy your application using a cloud provider like AWS or Google Cloud and serve your insights that way. Your insights can be an interactive dashboard that you build using Plotly or a simple API that allows others to pull data from your application. This is obviously the hardest, most involved way to share your work but it is worth it if you want to learn how to become a full-stack data scientist and gain software development experience.
The main point is to show that what you built is valuable and people care about your work. Show the impact of your work. Interviewers and your teammates would be impressed.
Now, you’re probably thinking that this is a lot of work and includes so many different skills that it’s going to take your years to be able to master. And the answer is, yes, it’s supposed to take your years to master. The great part of these components is that you can work on them independently of each other. You can learn how to grab data from an API separately from learning how to work with databases. Master one component at a time and eventually you’ll master them all.
You don’t need to do multiple projects to master these skills. This is just one project. You’re building a data science infrastructure and learning the data science process.
Once you build the infrastructure like connecting to an API to pushing data to a database to building a model to produce nice visuals, you can use the exact same framework for other analyses, and probably just need to slightly revise your code at each step. You can use the same code to connect to a new API and grab a new dataset. Use similar code and techniques to clean your data. So on and so forth. Once you have the infrastructure built end-to-end, you can start working with other datasets and build other types of models using the same framework.
So keep iterating and improving and providing something of value to others, not just yourself. Hope this gives you some ideas for your next data analytics project. This project is something that would impress an interviewer if your goal is to get your first data science job.