How to Learn Data Science with Python
Python beats out R, Excel, and Tableau for learning data science.
I learned how to code in Python about a year and a half ago when I wanted to do a common data science task and my favorite language, R, just wasn’t up to the task. I wanted to scrape some websites and predict results from the data I gathered. (For full disclosure, I wanted to know if asking questions in my Instagram cats’ captions generated more comments. It did!)
As I found, you can learn data science with Python very easily because of its comprehensive libraries and packages. It can handle machine learning, big data mining, forecasting, natural language processing, and anything else you might imagine a data scientist trying to accomplish. That’s what inspired me to write this post about learning data science with Python.
This post will assume a few things about you as a reader: one, you want to learn data science. It’s a great career - good fun if you like satisfying your curiosity and investigating patterns, financially rewarded and sought after by many legions of prestigious data science companies. It’s more than just the decade’s hottest job - it’s a career that lets you ask questions that keep you up at night, and come up with interesting answers. I think a lot of us have that innate curiosity, and we’d like to see it satisfied.
I’ll also assume you know some Python. You don’t need to be an expert, but if you want to learn data science with Python, it’ll be very helpful to know some Python to start out with. You don’t need to be a pro - I managed to accomplish my task with just a little Python knowledge - but knowing how to operate the terminal is a must. Luckily, the Python community is robust, helpful, and overflowing with tutorials, explanations, and veterans.
Finally, I’ll assume you’re looking for a roadmap: something that will help you get a job in data science by leaning on your Python knowledge. Python is a great language to use to learn data science because a lot of its strengths, such as creating easy visualizations, machine learning, and data mining, are all used by data scientists in the day to day of their jobs.
If you meet all those assumptions, this is the post for you. This article will guide you through how to learn data science with Python, with the ultimate goal of gaining employment as a data scientist.
1. You need to understand what data science actually is
Seems like a weird place to start, but when I started learning Python for the first time, I had no clue what data science was. If I had to define data science to myself back then, I’d say something like making charts. I wouldn’t have been wrong, but that’s not the whole truth either. I didn’t even know I was learning data science with Python because I would have never guessed that making predictions is part of data science.
Data science is rather clunkily defined in Wikipedia as a “concept to unify statistics, data analysis, informatics, and their related methods in order to understand and analyze actual phenomena with data.” Simply put, it’s using statistical methods to understand trends.
Data science is not specific to Python or any other language. You could probably do all data science tasks armed with nothing but a pencil, some paper, and a decent calculator (or god forbid Excel). If you know what data science is, you know that Python is not a necessity, but rather a handy shortcut, something that lets you do it all neatly, quickly, and effectively.
To learn more about what data science is, it helps to think of it in more practical terms rather than vague definitions. StrataScratch has a great guide that breaks down the types of questions you’ll be asked in Data Science interviews, which offers a holistic look at the sorts of professional tasks data scientists do every day.
With this knowledge, the definition provided by Wikipedia makes a bit more sense. Data science in a professional sense can be described as ways to understand numbers in a way that will help businesses make decisions, using tools such as statistical models, customer data, probability, and forecasts. Numbers come in, data science happens, decisions come out.
2. You need to know what you don’t know
The Venn diagram of Python skills and data science skills is absolutely not a circle.
Once you know what data science is in practical terms, the second step is mapping out your existing knowledge in Python. Python skills and typical data scientist job requirements overlap considerably, so you’ll probably find that as you go through typical data scientist skills, you may know how to do them already with Python. However, it’s unlikely that you’ll be able to do all of it.
For example, do I know how to use TensorFlow to do neural network modeling? Definitely not. But that’s included in the “data science” category. If I wanted to learn data science with Python, I would count that in my stack of weaknesses and focus on learning that and other data science skills.
Ask yourself: What can you currently do with Python? What are you still weak at? The best way I recommend doing this is by checking out a repository of some typical Python-based data scientist job interview questions, like these data science Python questions. From there, you can easily plot out your areas of expertise and brush up on what’s likely to come up in data science jobs.
For example, you might find that while you’ve got a real handle on using the `pandas` package to calculate differences and output the numbers, you need a bit more practice on developing forecasts with Python.
Once you’ve got a better understanding of your strengths and weaknesses, you’ll be much better prepared to learn data science with Python.
3. Do fun Python-based data science projects
So now you know what you don’t know. The third step is to polish those rusty skills and learn new ones. The best way to do that is by finding a passion project.
I learned data science with Python because I was obsessed with understanding trends in my cats’ Instagram account. That natural curiosity drove me to learn how to scrape data, put it in a parsable format, analyze it, and output results. If I had been trying to do it for school, I would have given up at the first hurdle, or just copy-pasted code without any real understanding.
While Python is a great language for beginners, it’ll be much easier for you to learn data science with Python if you’re coming at it with a personal fire for understanding and comprehension, rather than just memorizing commands and pasting code.
Go back to your list of strengths and weaknesses and find the parts where you’re still struggling. What burning questions do you have? Do you wonder how the weather affects your running times? Do you want to understand why there are so many chipmunks in your neighborhood? Do you want to dig into the relationship between bird size and their likelihood of going extinct? (These are all personal examples, in case you couldn’t tell.)
Have a look at some online projects to gain inspiration. For example, if you want to find some machine learning-inspired projects, Confetti is full of useful examples. If you can orient your motivation so the only thing standing between you and the answer to the question you’re dying to find out is your lack of skill, learning data science with Python will pose no obstacle at all. You’ll quickly be inspired to learn data science with python and answer those irritating questions.
Pursuing these kinds of data analytics projects with python will give you a great grasp of the various statistical methods, data viz charts, and problems you’ll run into on your way to learning data science with Python.
4. Look for skills mentioned in entry-level data science jobs
If you’re in the position of learning data science with Python, you probably aren’t going to jump right into a six-figure data science job right away. That’s OK! With the high demand for data science skills, you don’t need a degree to get a data science job. You just need experience.
With that in mind, it doesn’t make sense to look at the job descriptions of those big data science jobs, because it’ll be unrealistic for you to learn data science with Python to that level. To start working your way towards getting that dream data science job, you should instead look at job descriptions for business analysts and data analysts, which will have some overlap with data scientists. Junior data scientists will be good job roles to look at as well.
Similar to mapping out what you don’t know, write a list of all job requirements you couldn’t currently handle. This will give you an idea of the practical skills you’ll need to perfect to get started on the data science career path. Then, look for ways to accomplish those tasks using Python.
For example, Booz Allen Hamilton asks their junior data scientists to have experience with NLP. One quick google search led me to this article, where Eric Kleppen put together three simple projects to learn NLP with Python. The more specific jobs you look at, the better an idea you’ll have of the most common and sought-after skills for entry-level data science jobs.
Once you’ve set and completed a couple of these projects, you’ll be 95% of the way to learning data science using Python. You’ll know what data science is, what you need to focus on, what fun projects can help you learn the skills you don’t have yet, and even what entry-level jobs are looking for. The final step is just to keep learning.
5. Stay curious
Learning data science never really ends, with Python or any other language. Consider that the data scientist career path barely existed a decade ago, and you’ll understand how quickly things can change. As with any language, new packages, projects, and concepts come out all the time in Python. To stay up to date, employable, and in-the-know, make sure you approach these developments with passion and curiosity.
At every step in your data scientist career path, you can repeat steps 1-4 and you might get a slightly different answer, as the career path evolves, as well as the skills and requirements that go with it. When you’re ready to get your next job, revisit the gaps in your knowledge. When you’re thinking about going for a data scientist job at Google, make sure you know what skills they’ll ask you to prove. And above all, stay curious. Keep asking questions. Approach your own ignorance with passion and interest. That’s how you can continue learning data science by using Python.
Final thoughts on how to learn data science with Python
This article was geared for folks who know some Python, think they might like a job in data science, and want to know how to lean on their existing Python to learn data science. If you want a complete career guide from scratch, check out this - How to become a data scientist from scratch. Python is the best coding language to learn data science with, easy for beginners to pick up, and easy for beginners to improve on.
To recap, the best way to learn data science with Python is simply to:
- Have an up-to-date understanding of what data science actually is
- Map out your existing Python knowledge and knowledge gaps
- Pick out Python projects that will keep your interest
- Understand the skills entry-level data science jobs will ask for
- Stay curious about data science and Python as a coding language. Both change frequently.
With that, you’ll be on your way to learning data science with Python and securing your dream job as a data scientist.