What Does a Data Scientist Do?
What does a data scientist do? The ultimate guide to working your way through data science.
You really can’t avoid it, can you? It’s mentioned wherever you look. Your LinkedIn feed, job market, news feeds, education programs trying to get your attention (and your enrollment fee). But what is data science really? It’s often very vaguely described leaving much to be desired. This guide will try to avoid all that and provide you with the best possible, most direct, and clear answers to "What is data science?" and "What does a data scientist do?".
So, what do data scientists do? To answer this question, we’ll lead you through the various aspects of working in data science.
Role and Responsibilities of a Data Scientist
The role of data science is to use unfathomable amounts of data every company is collecting nowadays and turn it into understandable and useful information. This transformation of data into information is done by using techniques such as machine learning (ML), artificial intelligence (AI), and statistical analysis. All that is done with the purpose of solving real-world problems. Real-world usually translates as business problems. This means companies use data science to make more sound business decisions and make more profit.
Now that we’ve covered the role of a data scientist, it’s time to ask what that means in practice. What does a data scientist do? A direct question deserves a direct answer.
- Identify business problem
- Collect data
- Prepare data for analysis by manipulating and cleaning it
- Store data
- Analyze data to find trends and patterns
- Build, train, and validate the model
- Present insights
Data Science Job Titles
The most general job title in data science that encompasses all the skills used in the data science field is, well, a data scientist. Being a data scientist includes all the responsibilities mentioned above. However, this is not the only job title you can have if you work in data science.
There are numerous other job titles that depend on seniority, company organization, size, etc. Above all, the job titles depend on the part of data science they focus on. You can look at it as data scientist being a primordial soup of data science, with all other job titles originating from it.
Generally, the job titles in data science can be put into one of the two categories:
- Data providers
- Data users
There’s very detailed information in our blog post about every data science job title we’ll mention here. Use that post to find detailed job descriptions and skills required for every position.
When we talk about data providers, we’re talking about jobs that focus on raw data, data infrastructure, data loading, and databases.
The data science job titles in this category are, for example, data modeler, data engineer, database administrators, data architects, and software engineers. In one way or another, they all ensure that the other category of jobs in data science (data users) have uninterrupted access to data, which sets the basis data users can build on.
Of course, all those data provider jobs have different purposes between them.
For example, a data modeler creates conceptual, logical, and physical database models and is involved in database implementation.
Data engineers are more concerned with data infrastructure, its development, and maintenance, including data warehousing and extracting, transforming, and loading data (ETL/ELT).
Check out our post on Data Engineer vs Data Scientist that can explain what data scientists and data engineers have in common and what they don’t.
Database administrators, given the data infrastructure, ensure data and database integrity and security. This includes granting and revoking access to data, backing up databases, restoring data, etc.
On top of the data infrastructure provided by the above job titles sit software engineers. They design, develop, test, and maintain software that serves as interface data users will use to make the best of underlying data and data infrastructure.
Data architects provide the big picture and coordinate all those data providers. Their job is to understand the company’s processes, so they can plan, implement, and improve the architecture of the company’s data handling infrastructure. This means providing solutions to how data enters the company at different entry points. In which format data enters it, which software is used to process it (if any), and how data is transformed and loaded into the database(s) or data warehouse. How it is used by the company up to the point data becomes the company’s output.
Data users use the available data and data infrastructure to provide information to various shareholders. They are the link between rather “engineerial” jobs of data providers and decision-makers, who are usually less on a technical side.
Data users within data science, aside from data scientists, include data analysts, statisticians, BI developers, business analysts, quantitative analysts, marketing scientists, machine learning engineers, research scientists, etc. They, again, all have a different purpose within a company.
For example, data analysts are focused on reporting, regular and ad-hoc analyses. They use data and summarize it into the reporting format. This gives less technical-savvy users the possibility to use this data and understand various aspects of the company’s business. Data analysts primarily use historical data.
Statisticians are similar to data analysts in the way they too analyze data. However, they are more concerned with predicting the future and not so much with explaining the past. They use data to see what will happen, not what already happened. To do that, they apply statistical methods to data, such as hypothesis testing and probability. In that way, statisticians are also similar to data scientists. The difference is they, unlike data scientists, don’t build models and are focused only on the statistics part of data science.
The BI developers are the ones who develop (design, build, and maintain) dashboards in BI tools with the purpose of data visualization and reporting. They are similar to data analysts in the way they also make reports. However, they also have some engineering skills which they use to ETL data and build the user interface, like data engineers, and software engineers do, respectively.
Business analysts are focused on reporting, just like data analysts. However, they are usually focused on internal reporting, which is not always the case with data analysts, to detect weaknesses in the company’s business processes and improve them.
Quantitative analysts are, more often than not, data scientists focused on financial data. They will analyze it and build models regarding various financial markets, such as loans, stock, bonds, FX, etc. Their analyses will be used for deciding on trading strategies, feasible investments, and risk management.
Marketing scientists are, again, data scientists working with only one type of data. In this case, it’s marketing data. Like any data scientist, they will analyze such data and try to find patterns and trends to explain and predict customer behavior, which helps solve marketing and sales problems.
Machine Learning Scientist
Machine learning scientists are some kind of extension of data scientists. While data scientists are more concerned with the theoretical part of building models, data engineers put those models into practice. They take prototype models and deploy them to production. This involves engineering AI software and algorithms that will make machine learning models work in practice.
While machine learning engineers are the practitioners in this category of data scientists, research scientists are the theorists. The research scientists’ job is to understand the computing principles and issues arising from them. To solve these problems, they improve or create completely new algorithms and programming languages.
Data Science Career Path
In the picture below, there’s an example of how your data scientist career path could look like. It doesn’t mean it’s a one-direction journey (it doesn’t have to be a journey at all!) or that these job titles can’t be interchangeable and moved between in different ways. It’s just an overview, have a look at it, and then we’ll follow it with some explanations.
Education as a Starting Point
Data science finds itself at the crossroads of statistics, mathematics, and computer science. And some other disciplines, too. So having being educated in at least of those fields is a good starting point.
However, we can’t write the guide that will apply to every candidate and job ad. The general rule of thumb is: get at least a BS degree to have a good starting position to compete in the data science jobs market. Then combine it with working experience. A good balance of both is always a recipe you can’t go wrong with. Of course, getting more education and ever-so-more experience always puts you in an even better position; no surprise in that. Let’s have a look at what are the education/degree requirements:
- BS/Masters’ Degree
If you want to build a career in data science, it’s a good idea to have at least a bachelor’s degree. Having a BS or a Masters’ degree is good for getting any job in data science, with this level of education required in most job ads. Your degree should be in relevant quantitative fields such as statistics, mathematics, computer science, engineering, IT, economics, programming, etc. It, of course, depends on the job title and the level of seniority.
Also depending on the job are the benefits you might have from a degree in some different fields. Maybe humanistic studies, such as philosophy, sociology, psychology. They can be useful (sometimes even required!) if you want to be a marketing scientist trying to understand and predict human behavior. Research scientists sometimes can work on computing principles that can be very deeply connected with ethics and human behavior.
Depending on the job description and seniority, it could also be beneficial if you have finance, business, or a similar degree. Maybe you work with the financial data, and you’re high up the hierarchy so, along with your technical skills, having some leadership and business nous and education is something that becomes important too.
While a bachelor’s degree is quite often a minimum level of education required in the job ads, sometimes it’s not the only one.
Having a Ph.D. won’t hurt your chances of getting any of the jobs above. More education is always better.
However, sometimes this level is not only nice to have, but instead required. For example, it would be a good idea to get a Ph.D. if you want to work as an ML engineer or any other mathematics-intensive job.
Also, research scientists need to be strong in computer science theory, principles, and research methodology. That’s why a Ph.D. is often required for this position.
While formal education is often required in the job ads, it doesn’t mean it’s always necessary. If you are experienced in some aspects of data science but don’t have formal education in this field, it doesn’t mean you can’t work as a data scientist. Generally, the more senior position, the less important your education is. What matters is what you did in your previous jobs, how you did it, and what skills you can bring to the new job.
There’s a catch-22 here. You need a job to get experienced and brush up your skills. And you don’t get a job if you’re not experienced and don’t have the technical skills. Luckily, there’s a solution to that: boot camps.
They are a good starting point for getting the appropriate skills for data science. They don’t require a technical BS or Master’s degree. This is great for anybody who doesn’t have a formal education and wants to start a career in data science. They are also suitable for people who wandered into data science through practice. That way, they can get a more structured and theoretical background to what they already do in practice or improve their already existing skills.
Working Experience as a Data Scientist
Speaking of working experience, it’s always the hardest thing to start getting the working experience. Once you start working and learn at your job, it becomes easier to change jobs and widen your field of expertise. It’s important to build a strong foundation. When starting in data science, people generally start as data analysts.
From then on, they can choose to go into two directions we’ve discussed earlier on: working as a data provider or working as a data user. One important thing about the image above is that as you go from left to right, the seniority of positions goes up, and your salary goes up. We’ll talk about salaries in a moment. Let’s first examine an example or two of how your career could look like.
Let’s say you start as a data analyst. After several years of working with data and finding your own workarounds regarding databases, you understand the database principles, so you decide to move to become a data modeler or database administrator. Working in one of those positions get you more experience, and you participate in several projects regarding the data infrastructure. Then you get promoted and become a data architect, for example.
Or maybe you start as a statistician. After spending several years in a company, you decide it’s time for a change. But you really like the company you’re at. And you really liked several marketing projects you participated in last year. You move to a marketing department to work only with marketing data and become a marketing scientist. Then again, it’s time for a change; you got interested in machine learning and became a data scientist. After several years of that, you’d like to go back to school and get a Ph.D. You quit the job and dedicate yourself to getting a Ph.D. That, combined with your vast working experience, makes you realize you want to contribute to data science in a different, maybe theoretical way. And then you become a research scientist.
These are only examples of how your career could look like. Any similarity to actual persons and their careers is purely coincidental. Your career will depend on your background, your abilities, your interests, opportunities you have at your (or other) company, the company’s size, organization, flexibility, and yes, a little bit of luck.
Any way you choose could benefit you in long-term. Remember, all those jobs are part of data science, so having more experience in one field of data science can only be beneficial if you want to shake things up a little bit and do something new to you within a data science field.
Of course, to get the experience, you first need a job. To get a job, you’ll have to go through the often tedious job interview process. To make that experience as painless as possible, you need to be prepared. While nothing beats the interview experience, going through our coding and non-coding interview questions will get you off on a good foot.
Data Science Technical Skills You Need
In data science, the following skills are mandatory:
- Coding in languages such as SQL, R, Python, Java, C-family
- Working with data, which involves collecting, cleaning, and analyzing data
- Database design to understand how to get and store data
- Statistical analysis for gaining insights from data
- Mathematics used in data analysis and metrics calculation
- Modeling for designing and building models
- ML & AI for deploying models
Check out our post data science skills to learn what data science technical skills and business skills are in the highest demand that you must have as a data scientist.
Data Science Salaries
In choosing a career, aside from your interests and circumstances, the salary also factors in.
According to Jobted, who cite the U.S. Bureau of Labor Statistics (BLS), the average annual salary in the US is ca. $53,5k.
So how do jobs in data science compare to this? For example, Glassdoor data shows data analysts, on average, earn $70k annually. Even this (on average) lowest paying job in data science will get you more than $15k above US average. This is 30% more!
Working as a data scientist, which is one of the highest-paid jobs in data science, could on average earn you $139k, which is more than 1.5 times higher than the average. Even the lowest reported salary is double the US average, while the salary can go up to $171k. And that’s not even the highest paying job in data science.
Education, knowledge, and skills really do pay off, in case you wondered should you invest in career advancement or change. Below is the overview of the job positions and average salaries in USD.
|Business intelligence (BI) developer||$92k|
|Machine learning engineer||$189k|
Depending on the company you work at, you can expect those base salaries to be increased by different benefits, such as cash and stock bonuses, health and life insurance, etc.
You can find more detailed info about salaries in data science in one of our blog posts - Data Scientist Salary.
Working Hours in Data Science
Often, being high in demand and getting paid rather sweet money comes with a price. No, it’s not all rosy in data science. While usually working 40 hours per week, data scientists occasionally need to put in long hours. Again, it depends on the company, its organization, the industry, and other numerous factors. But most often, it goes with the job description and its periodical nature which means you’ll balance between completely relaxed and easy periods and peaks when you’ll have to put in 50-60 working hours per week.
That’s because data science tasks usually involve projects, which means solving problems within strict deadlines. As the deadline approaches, the workload usually increases, and that’s when data scientists have to put in some extra hours.
What Makes a Great Data Scientist?
Having relevant education and technical skills is, of course, the prerequisite for becoming a data scientist. To complete this first step, follow the useful advice we offered when we talked about how to become a data scientist from scratch. But does this make you a great data scientist? Not necessarily.
The point of data science is solving real-life problems. You can have all the technical skills in the world, but if you can’t use those brilliant skills to come up with a solution, what’s the point? Or you come up with a solution, but nobody understands it and uses it. Did you really solve the problem? No, you didn’t.
Technical skills are used for solving the problem, and one of the best ways to hone them is creating a data analytics project of your own However, you need the soft skills too. Coming up with a solution is somewhat sandwiched between the other two important phases of a data scientists’ job.
- Understanding the problem
- Coming up with the solution (via technical skills)
- Presenting the solution
To be a great data scientist, you need to:
- Be childish
- Communicate well
- Be good at teamwork
- Feel good with cross-sectionality
We’re not talking about being a spoilt brat. Being childish means being curious, asking questions, wanting to learn, and being playful.
You need to be curious, accept that you don’t understand everything, and be willing to learn. To do that, you have to do what children do: ask questions until they get answers they’re happy with. You need to become the “why guy”. Only that way you’ll be able to understand the business problem, different people’s, departments’, and customers’ needs. Once you understand them, using your technical skills becomes, well, technicality.
And when you come up with a solution, you need to be playful and imaginative of how you present your, probably very complex, solution so that others can understand it and use it.
Communication is a natural extension of the first skill. You need to communicate efficiently, ask the right questions, present your ideas and solutions in an understandable way. When people feel you’re open for suggestions, that you’re listening to them, and treating them with respect, they will be much more involved in the project. They will be willing to explain their (business) needs and problems in much more detail, making it easier for you to understand what is required of you properly.
Of course, there’s no point in coming up with a brilliant data science solution if you can’t explain how it works, how it benefits its users, and how they can use it. So the communication is necessary when you present your solution.
Luckily or not, you’ll be working with real people trying to solve their real problems. You won’t be working alone within your department. And you won’t be working only with people from your department. You’ll be working with various people from different walks of life, with different technical skills, specializations, and experiences. To be a successful data scientist, you have to understand people and have patience with them, be flexible, and adapt to different situations and approaches.
Creating a good working atmosphere will be beneficial for the company, your team, and yourself. Being dependable, responsible, and willing to help your colleagues is something that’s always appreciated.
Working in (different) teams with people means you’ll be working with different levels and fields of expertise. This is an opportunity for you to learn. That’s when the cross-sectionality comes in.
A data scientist knowing nothing outside the strict boundaries of data science can’t be a great data scientist. Cross-sectionality will make you understand and solve problems quicker. You’ll present the solutions more clearly. Understanding business, marketing, reporting, legal, or any other aspect of the industry you work in easily makes you a very desirable employer. Experts that can bridge the gap between the technical and non-technical departments are rare and very valuable flowers.
Data science is one of the hottest fields in the job market today. There’s a high demand for data scientists, but there’s also plenty of competition.
That means it’s not easy to become a data scientist. However, it’s not impossible either. This guide is one of the things that should make it easier to decide if data science is for you or not. To sum it up, here are the steps to getting a job in data science and be successful at it:
- Get the education in computer science or other quantitative fields
- Work on your technical skills, such as programming, data analysis, database design, and model building
- Prepare carefully for a job interview, which means answering as many technical and non-technical questions as possible, researching the company, and the position you applied for
- Work on your soft skills