At some point, every aspiring data scientist has to get familiar with mathematics for machine learning.
To be blunt, the more serious you are about learning data science, the more math you’ll need to learn for machine learning. If you have a strong math background, this is likely to be a little issue.
In my case, I’ve had to relearn much of mathematics (note – I’m not done yet!) that I took at a university as my professional life had allowed my math skills to atrophy.
Based on my experience teaching our Bootcamp there is also a group of aspiring data scientists that fall into a category where their formal math training needs to be augmented. For example, we have many students that come from marketing backgrounds where, for example, studying linear algebra was never a requirement.
What math skills do data scientists need in machine learning
Forms of the question “what math do I need for data science” and “what math do I need for machine learning” are popular on sites like Quora. I would encourage all aspiring data scientists to perform their own research on this subject and not to take my post as gospel. However, as I often get asked for my opinion on what math aspiring data scientists need to know/study, I will provide my own list:
- Basic statistics and probability (e.g., normal and student’s t distributions, confidence intervals, t-tests of significance, p-values, etc.).
- Linear algebra (e.g., eigenvectors)
- Single variable calculus (e.g., minimization/maximization using derivatives).
- Multivariate calculus (e.g., minimization/maximization with gradients).
Please note that the above is not an exhaustive list. To be honest, you likely can never know enough math to help you as a data scientist. What I would argue is the above list represents the 80/20 rule – the 20% of math that you will use 80% of the time as a practicing data scientist.
A list of top math resources
Here’s my list of the top 80/20 math resources for aspiring data scientists:
The Cartoon Guide to Statistics is one of the books we provide to our bootcamp students, and it is an excellent resource for gently learning – or refreshing – your statistical knowledge. It covers many of the basic concepts in statistics in easy-to-consume and an entertaining fashion. Well worth a read.
Coursera’s Statistics with R Specialization is necessary for every aspiring data scientist. The accompanying textbook (pictured to the left) is also a great read. I liked the book so much I picked up a hard copy from Amazon.
Interestingly, I’ve found that University of California Irvine’s free UCI Open course Math 4: Math for Economists is a most excellent resource for focusing on the specific aspects of linear algebra and multivariate calculus needed for aspiring data scientists.
The accompanying textbook is also quite good and covers several interesting subjects, including single variable calculus for folks that need a refresher.
The takeaway
Studying the above resources will allow you to go a long way in developing the math skills required for data science.
For example, you will be well-prepared to study books like Intro to Statistical Learning, Elements of Statistical Learning, and Applied Predictive Modeling, including all the mathematics related to the algorithms.
Until next time! I wish happy data sleuthing!