5-day Bootcamp Curriculum

Learn more about the immersive bootcamp curriculum

The best data science bootcamp curriculum hands-down.

Our instructors are practitioners who know what matters. We have designed the most practical short-duration curriculum that has gotten thousands of working professionals from 1400+ companies globally to get started with practical data science in just one week.

Schedule a meeting with an Advisor

UnitLessonDescriptionTimelineDurationTopicsTools/LabsSampleSample Video
Data Science FundamentalsData Exploration, Visualization, and Feature EngineeringThe first and most important task of the data scientist is to understand their data. The bulk of our first day is dedicated to the theory and practice of understanding data. Through a series of interactive, hands-on exercises, we teach you how to dissect and explore data, engineer your features, and clean your data to prepare it for modeling. You will learn not just the mechanics of data exploration, but also the proper mindset, one that will help you tease out the patterns hidden in your data.Day 15 hoursExploration, Visualization, SegmentationR, Pythonhttps://datasciencedojo.com/wp-content/uploads/data_exploration_visualization_slide_sample.pdf
Data Science FundamentalsPredictive Analytics FundamentalsOur first foray into predictive analytics is guided by a deep dive into the mechanics and theory behind decision tree models. The basis of some of the most successful predictive models, decision trees provide a useful vehicle for hands-on exercises in training and testing classification models.Day 1, Day 26 hoursPredictive Analytics, Classification, Decision Trees, Gini Index, Entropy, Training/Test SplitsR, Pythonhttps://datasciencedojo.com/wp-content/uploads/predictive_classification_decision_slide_sample.pdf
Model Evaluation and Parameter TuningEvaluation and Fine Tuning of Predictive ModelsOne of the subtlest and trickiest areas of modern data science is in model evaluation. The risk of “overfitting” and producing a model that generalizes very poorly constantly hangs over the practitioner’s head. We teach you about the metrics and methods you can use to protect yourself from this danger, giving you direct, practical experience in how to tune your models for greatest effectiveness. We’ll familiarize you with the evaluation and model tuning capabilities with hands-on exercises. In addition, we teach you to understand the effects of each algorithm’s configuration parameters, and to use this knowledge to tune your models for optimal performance.Day 24 hoursAccuracy, Precision, Recall, ROC, AUC, Cross-validation, Bias/Variance Tradeoff, Model Tuning, ParametersR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/evaluation_classification_slide_sample.pdf
Ensemble MethodsBagging, Boosting and Random ForestAfter building a predictive model and understanding the pitfalls of wrong choice of evaluation metrics, we move to somewhat advanced learning techniques. We discuss the importance of ensemble techniques in machine learning and how they help us get machine learning models that are more generalized. The module goes in-depth into sampling with/without replacing, bootstrapped sampling, bagging, random forest and boosting.Day 2, Day 35 hoursBinomial Distribution, Bagging, Boosting, Random Forests, AdaBoostinomial Distribution, Bagging, Boosting, Random Forests, AdaBoostR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/ensemble_random_forest_slide_sample.pdf
Modelling Unstructured DataIntroduction to Text AnalyticsSo far, we have only dealt with fully structured data, but many applications of data science require analysis of unstructured data. We will teach you the basics of converting text into structured data, and how to model documents to find their similarities and recommend similar documents.Day 31.5 hoursUnstructured Data, Stemming, Lemmatization, Stop Words, TF-IDFR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/text_analytics_slide_sample.pdf
Modelling Unstructured DataUnsupervised Learning and ClusteringArguably the oldest branch of machine learning, unsupervised learning at its core is about revealing the hidden structure of any dataset. We teach you about K-Means, a popular clustering algorithm. You will also learn how to approach an unsupervised learning challenge, and how to properly take advantage of clustering algorithms in your data explorations.Day 31 hourK-Means, Clustering, Data ExplorationR, Pythonhttps://datasciencedojo.com/wp-content/uploads/unsupervized_slide_sample.pdf
Recommenders and AB TestingRecommender Systems and RankingIn many ways, recommenders are the first and greatest problem of modern machine learning, and they are the engines which drive modern commerce. You will learn about the two types of recommenders, and how to blend them to get the best of both worlds. We discuss text analytics, search, and evaluation as well as product recommendation. You will then build and deploy a recommendation engine in Azure Machine Learning.Day 3, Day 43 hoursCollaborative Recommendations, Content-based Recommendations, Text Analytics, Search, Similarity, nDCG, Training/Test splitsAzure MLhttps://datasciencedojo.com/wp-content/uploads/recommender_sys_slide_sample.pdf
Recommenders and AB TestingOnline Experimentation and A/B TestingOnline experimentation is perhaps the most misused of data science techniques. We will walk through the best practices for designing and evaluating A/B and multi-variate tests. We discuss how to choose the appropriate metrics, how to detect and avoid errors, and how to properly interpret test results.Day 42 hoursA/B Tests, Multifactor Tests, Confidence Intervals, Type 1 & Type 2 Error, MetricsR, Pythonhttps://datasciencedojo.com/wp-content/uploads/ab_testing_slide_sample.pdf
RegressionRegression and Predictive AnalyticsRegression and classification are the two sides of the supervised learning coin. You will learn how to adapt the techniques you have learned to the challenge of predicting prices, revenues, click rates, and more. We give you an overview of how regression models learn, teach you how to evaluate them, and demonstrate the use of regularization to prevent overfitting. We end with hands-on exercises.Day 44 hoursLinear Regression, Cost Functions, Batch & Stochastic Gradient Descent, L1 & L2 Regularization, RMSE, MAE, Coefficient of DeterminationR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/regression_slide_sample.pdf
Data EngineeringBig Data EngineeringThe first challenge of big data isn’t one of analysis, but rather of volume and velocity. How do you process terabytes of data in a reliable, relatively rapid way? We teach you the basics of MapReduce and HDFS, the technologies which underlie Hadoop, the most popular distributed computing platform. We also introduce you to Mahout and Spark, the next wave of distributed analysis platforms.Day 54 hoursMapReduce, HDFS, Hadoop, Hive, Mahout, SparkAzurehttps://datasciencedojo.com/wp-content/uploads/big_data_engineering_slide_sample.pdf
Data EngineeringReal-time/IoT and Deploying a Predictive Model as a ServiceThe best model in the world is useless if you can’t get new data to it. Azure Machine Learning provides direct and simple processes for setting up real-time prediction endpoints in the cloud, allowing you to access your trained model from anywhere in the world. We walk you through constructing your own endpoints, and show a few practical demos of how this can be used to expose a predictive model to anyone you’d like to use it. We also take you through building your own end-to-end ETL (extract, transform, load) pipeline in the cloud. You will stream data from your smartphone to an event ingestor, process that data, and write it out to cloud storage. You will then be able to read the data into Azure for analysis and processing.Day 55 hoursReal-time Prediction, Batch Prediction, REST Endpoints, Security, Internet of Things, ETL Pipelines, Stream Processing, Event Ingestion, Anomaly DetectionAzure ML, Azure Stream Analyticshttps://datasciencedojo.com/wp-content/uploads/real_time_slide_sample.pdf
Bootcamp PreparationIntroduction to Big Data, Data Science and Predictive AnalyticsWe introduce you to the wide world of Big Data, throwing back the curtain on the diversity and ubiquity of data science in the modern world. We also give you a bird's eye view of the subfields of predictive analytics and the pieces of a big data pipeline.Pre-Bootcamp2 hoursBig Data, ETL Pipelines, Data Mining, Predictive AnalyticsNonehttps://datasciencedojo.com/wp-content/uploads/2016/03/Introduction-to-Big-Data-Predictive-Analytics-and-Data-Science-sample.pdf
Bootcamp PreparationFundamentals of Data MiningAll great learning opportunities are built on a solid foundation. This session is jam-packed with all the background information, technical terminology, and basic knowledge that you will need to hit the ground running on the first day of the bootcamp.Pre-Bootcamp1.5 hoursDataset types, Data preprocessing, Similarity, Data explorationNonehttps://datasciencedojo.com/wp-content/uploads/2016/03/Data-Mining-Fundamentals-sample.pdf
Bootcamp PreparationIntroduction to R ProgrammingHere we introduce the basics of the R programming language. R is a free, open-source statistical programming platform. It is designed to make many of the most common data processing tasks as simple as possible. With this knowledge, you'll be able to engage fully with the hands-on exercises in the class.Pre-Bootcamp2 hoursR basics, R data types, R language features, R visualizationR
Bootcamp PreparationIntroduction to Azure Machine LearningAzure Machine Learning Studio is a fully featured graphical data science tool in the cloud. You will learn how to upload, analyze, visualize, manipulate, and clean data using the clean and intuitive interface of Azure MLPre-Bootcamp1.5 hoursAzure ML basics, Azure ML preprocessing, Azure ML visualizationAzure ML
Continued LearningKaggle CapstoneYou've been learning the knowledge and skills of data science for 3 days. Now it's time to put those new skills to the test with a real problem. Kaggle's Titanic survival prediction competition is the perfect testing ground to cut your teeth on. You'll compete against your fellow students, with the top 2-3 contenders receiving a special prize.Day 3, Day 4, Day 5, Post-Bootcamp3 daysData Exploration, Model Training, Model Evaluation, Model TuningNonehttps://www.kaggle.com/c/titanic
Continued LearningNaive BayesNaive Bayes is one of the most popular and widely used classfication algorithms, particularly in text analysis. It is also a simple, fast, and small algorithm suitable for use on datasets of any size. We teach you how Naive Bayes works, why it works, and when it is likely to break down.Post-Bootcamp1 hourConditional Probability, Bayes' Rule, Independence, Naive BayesR, Pythonhttps://datasciencedojo.com/wp-content/uploads/2016/03/Naive-Bayes-sample.pdf
Continued LearningLogistic RegressionLogistic Regression is one of the oldest and best understood classification algorithms. While not suitable for every application, it is fast to run and cheap to store. We will teach you how logistic regression fits a dataset to make predictions, as well as when and why to use it.Post-Bootcamp1 hourCost Functions, Logit Function, Decision BoundariesR, Python, Amazon MLhttps://datasciencedojo.com/wp-content/uploads/2016/03/Logistic-Regression-sample.pdf
Continued LearningIntroduction to NoSQL DatabasesWith the massive increase in velocity and volume of data, even the largest and fastest SQL database lags under the load of millions of requests per second. We teach you how NoSQL databases solve this problem, sacrificing a small amount of consistency for a massive increase in durability.Post-Bootcamp1 hourCAP theorem, NoSQL, HBaseAzurehttps://datasciencedojo.com/wp-content/uploads/2016/03/Introduction-to-NoSQL-Databases-sample.pdf
Continued LearningSelf Directed LabsThe world of data science and data engineering is larger than we have time to cover in the bootcamp. We want you to be as equipped to tackle this world as possible, so we have written a 350+ page textbook filled with step by step tutorials introducing you to many different tools. You will get a copy of this book at the bootcamp, allowing you to learn this additional information at your own pace.Post-Bootcamp2 - 4 weeksAzure SQL Database, HBase, Hadoop, HDInsight, Azure PowerShell, Mahout, Spark, Live Twitter Sentiment AnalysisAzure, Amazon, Hadoop, Sparkhttps://datasciencedojo.com/wp-content/uploads/2016/03/Self-Directed-Labs-sample.pdf
Continued LearningKaggle MentoringThough the course is finished, the learning is not. The only way to become a data scientist is to practice data science, and one of the best ways to practice data science is Kaggle. You will join with one or two of your fellow attendees, pick a Kaggle competition, and tackle it with the assistance of one of our crack team of industry data scientists.2 - 4 weeksKaggle, Data ScienceKagglehttps://datasciencedojo.com/about/team/#teaching

Check out Data Science Dojo reviews by past attendees.

*Data Science Bootcamp Day 5 is not included in the California bootcamp curriculum.

Download our course outline