Data science is roughly split up into four areas:

1. Programming
2. Statistics
3. Communication
4. Domain Expertise

This document details how to get started with the first three. This is a very R-centric guide because I live an R-centric life. Free resources are strongly favored because I’m a graduate student.

Programming

Programming allows you to articulate yourself in the dimension of computational action. Serious programmers are taken seriously, so you should aim to become one.

• Learn Python the Hard Way by Zed Shaw. Thousands of people have used this book to learn how to code for the first time. Teaches you many fundamental programming concepts, and Python is an essential language to know for anyone who writes code.
• The Unix Workbench by me. A book I wrote for beginners about how to use the command line, another baseline skill for a data scientist. Read to the end of the third chapter before starting Learn Python the Hard Way, then come back if you feel like you want more command line skills.
• R Programming for Data Science by Roger Peng. Learn R deeply to really start your journey towards specializing as a data scientist.
• R for Data Science by Hadley Wickham and Garrett Grolemund. Modern data scientific paradigms applied in R based in the philosophy of Tidy Data.

Statistics

Statistics will help you quantify uncertainty. 90% of what you need to understand falls into the following four categories:

1. Probability and Combinatorics
2. The Law of Large Numbers and the Central Limit Theorem
3. Regression
4. The Bootstrap

Communication

People

The most valuable and important part of the data science profession are the folks who enable the empathetic, conscientious, and delightful community. Many great data science conversations take place on Twitter, and there’s a friendly and active discussion around the #rstats hashtag. The following are short descriptions of some awesome people in the community plus links to their Twitter profiles which are usually good portals to their personal websites, portfolios, and other examples of their work.

Tools

Once you’ve learned R or Python and a little bit about how to use the command line you will be able ot use some powerful tools for communicating your ideas.

• R Markdown allows you to create documents, reports, websites, and web applications. This site was created with R Markdown.
• Jupyter allows you interactively develop a data analysis much like R Markdown. Especially popular in the Python community.
• Git and GitHub allow you to share code and publish websites for free. Getting started with both can be a little difficult, but there are many fine tutorials available, including one that I wrote as a part of The Unix Workbench.

Online Courses

I spent two years working in the Johns Hopkins Data Science Lab developing courses in data science. There are two programs that I endorse:

• The Data Science Specialization. Nine courses in data science followed by a capstone project with an industry partner, all designed by the team at Johns Hopkins.