Home > Learning data science from scratch

Learning data science from scratch

When I say Data Science, I am referring to the tools that turn data into actionable insights. This includes machine learning, statistics, programming, and domain-specific knowledge.

A few resources to start your journey

The internet is a chaotic mess; there are simpler alternatives that offer to sort the mess for you.

Many websites like Dataquest, DataCamp, and Udacity offer courses in data science. Each website creates an education program that takes you from topic to topic. They just require a little course-planning on your part.

The problem? They cost too much and don’t teach you how to apply concepts in a job setting, and they prevent you from exploring your interests and passions.

There are free alternatives like edX and Coursera which offer one-off courses diving into specific topics. If you learn from videos or a classroom setting easily, these are excellent ways to learn data science.

Free Online Education Platforms

Check out this website for a listing of available data science courses. There are also a few free course curricula you can use. Check out David Venturi’s post, or the Open Source DS Masters (a more traditional education plan).

If you learn well from reading, look at the Data Science From Scratch book. This textbook is a full learning plan that can be supplemented with online resources. 

These are just a few of the free resources that provide a detailed learning path for data science. There are many more.

To better understand the skills you need to acquire on your educational journey, you need guidance on the following:
A Curriculum Guideline
Data Science Curriculum Guideline
Python Programming

Programming is a fundamental skill of data scientists. Get comfortable with the syntax of Python. Understand how to run a python program in many different ways. (Jupyter notebook vs. command line vs IDE)

A prerequisite for machine learning and data analysis- If you already have a solid understanding spend a week or two brushing up on key concepts.

Focus especially hard on descriptive statistics. Being able to understand a data set is a skill worth its weight in gold.

Numpy, Pandas, & Matplotlib

Learn how to load, manipulate, and visualize data. Mastery of these libraries is crucial for your projects.

Quick hint: Don’t feel like you have to memorize every method or function name that comes with practice (if you forget then Google it).

Check out the Pandas Docs, Numpy Docs, and Matplotlib Tutorials. There are better resources out there, but these are what I used. Remember, the only way you will learn these libraries is by using them!

Machine Learning

Learn the theory and application of machine learning algorithms. Then apply the concepts you learn to real-world data that you care about. Most beginners start by working with toy data-sets from the UCI ML Repository. Play around with the data and go through guided ML tutorials.

The Scikit-learn documentation has excellent tutorials on the application of common algorithms. I also found this podcast to be a great (and free) educational resource behind the theory of ML. You can listen to it on your commute or while working out.

Don’t stress. It’s a marathon, not a sprint.

Having a self-driven education can often feel like trying to read a never-ending library of knowledge. If you’re going to be successful in data science you need to think of your education as a lifelong process.

Just remember, the process of learning is its reward.