Comprehensive Data Science Curriculum

Practical data science learning. Our data science curriculum is designed for working professionals.

This bootcamp is offered in Live Online and In-person formats.

Marcello Azambuja 1

Marcello Azambuja

I’m really impressed by the quality of the bootcamp, I came with high expectation and Data Science Dojo exceeded it. I highly recommend the bootcamp to anyone interested in Data… Read more “Marcello Azambuja”

Marcello Azambuja
Director, Engineering at Uber
Mariana Esteves

Mariana Esteves

Right balance between hands-on and theory/concept. Raja makes sure we get the logic of (at first) very complicated statistics and machine learning concepts before applying them.

Mariana Esteves
Sr Product Manager at Uber
Raji Easwaran

Raji Easwaran

Fantastic boot camp!!! I was particularly impressed with Raja’s grasp of the subject matter as well as the passion he has. The details on the mathematical aspects although grueling allowed… Read more “Raji Easwaran”

Raji Easwaran
Principal Group Program Manager at Microsoft
Dr. Jenna Butler

Jenna Butler

Data Science Dojo was a great learning opportunity and a truly fun event. The instructors were open, facilitating dialogue and ensuring learning. It was the best run conference I have… Read more “Jenna Butler”

Jenna Butler
Senior Software Engineer at Microsoft
Manash Majhi v2

Manash Majhi

This was an excellent workshop on machine learning. Practical and great to get started with building predictive models. The most important thing that happened was I somewhat overcame my fear… Read more “Manash Majhi”

Manash Majhi
Sr. Product Intelligence Manager at Microsoft
manishgupta

Manish Kumar Gupta

The amount of information I gathered in a week long bootcamp with Data Science Dojo was phenomenal. I came from almost no background in Machine Learning and learned not only… Read more “Manish Kumar Gupta”

Manish Kumar Gupta
Senior Software Engineer at Microsoft
nirnay

Nirnay Bansal

5 days and 10 hours per day seems intense, but the Data Science Dojo team made it fun. This team is always ready to help you during and after the… Read more “Nirnay Bansal”

Nirnay Bansal
Cloud Solution Architect, Development Team Lead at Microsoft
Vaibhav Shrivastava

Vaibhav Shrivastava

One of the most practical and detailed oriented data science sessions I have ever attended. I have taken a full semester course on data mining but I can say this… Read more “Vaibhav Shrivastava”

Vaibhav Shrivastava
Software Engineer II at Microsoft
Haroon Ahmed

Haroon Ahmed

The instructor’s academic background combined with his relevant industry experience at Microsoft Bing makes it all very practical. I’m excited to continue learning and highly recommend others!

Haroon Ahmed
Partner Architect at Microsoft
3d77b6f11

Rehan Hamid

Raja’s knowledge and experience helped me apply data mining concepts to my job every day. I am not afraid to explore new tools due to the hands on exercises taught… Read more “Rehan Hamid”

Rehan Hamid
Sr Program Manager at Microsoft Corporation
Babith Bhoopalan v2

Babith Bhoopalan

This was easily one of the best training’s I have attended in my 10 years at Microsoft. A perfect combination of hands on, fundamental theories and adequate attention to detail,… Read more “Babith Bhoopalan”

Babith Bhoopalan
Director / Principal Program Manager at Microsoft
3c10bf0

Gautam Reddy

Loved the real world knowledge and passion of the instructors…feel equipped and inspired to further my journey in DataScience.

Gautam Reddy
Senior Technical Program Manager at Microsoft
Gurneet Jodhka v2

Gurneet Jodhka

Awesome training with all elements of Data Science and  Machine Learning. IoT experiment was great.

Gurneet Jodhka
Sr. Technology Leader at Microsoft
balchandar

Balachander Devakumar

Excellent time spent. It was an enlightening experience learning about Data Science and Data Engineering.

Balachander Devakumar
Senior Program Manager at Microsoft
Kavitha Azhagarsamy

Kavitha Balasubramanian

Overall quality of the boot-camp is very good. The ambiance of boot camp, motivation to teach, interest to learn were all positive. Highly qualified teachers with strong intention to make… Read more “Kavitha Balasubramanian”

Kavitha Balasubramanian
Sr Software Engineer at Microsoft
Andrea Peggion

Andrea Peggion

I absolutely loved this bootcamp. It was brutal, intense and rich of content…I think I have never learned so many things so fast. I feel like I’ve learned more in… Read more “Andrea Peggion”

Andrea Peggion
Senior Program Manager - Big Data - HdInsight Service, Hadoop at Microsoft
Jyotsna Panwar

Jyotsna Panwar

It was so refreshing to be back in a classroom sort of environment (but its on luxury side). Raja is so passionate about teaching that you feel motivated to learn.… Read more “Jyotsna Panwar”

Jyotsna Panwar
Business Intelligence Analyst at Microsoft (Consultant)
Dustin Cox v2

Dustin Cox

This training was even better than I expected – I am pleasantly surprised to be leaving with more than just an understanding of the topics, but also the ability to… Read more “Dustin Cox”

Dustin Cox
Senior Business Manager, Chief of Staff for Americas Operations at Microsoft
Nicole Allen

Nicole Allen

I can’t believe how quickly I went from knowing next to nothing to actually building a working machine learning model and understood the basic principles of what I built. The… Read more “Nicole Allen”

Nicole Allen
Principal Group Program Manager at Microsoft
Avatar 300

Roman Golovin

Most useful training I attended in years.

Roman Golovin
SDET at Microsoft
Avatar 300

Arthi Ramasubramanian Iyer

Great content in 5 days!

Arthi Ramasubramanian Iyer
Program Manager at Microsoft
Avatar 300

Katherine Olson

Great Course – A lot of work but extremely rewarding!

Katherine Olson
Software Development Engineer at Microsoft
Avatar 300

Miwa Hattori

Great overview of all things: got a good balance of theory and hands on exercises. Doing exercises right after really puts things into context. Hands on training and code samples that… Read more “Miwa Hattori”

Miwa Hattori
Sr. Data & Applied Scientist at Microsoft
Premal Shah

Premal Shah

Attending the boot camp was an amazing experience for me. The workshop is very well structured, fantastically taught, has the right amount of breadth and depth, and most importantly it… Read more “Premal Shah”

Premal Shah
Principal Program Manager, Azure Databricks at Microsoft
Avatar 300

Kannan Iyer

A great balance of theory and practice of Data science and data engineering delivered by knowledgeable practitioners in an immersive way!

Kannan Iyer
Microsoft
Ashwin Athreya Vankayala

Ashwin Athreya Vankayala

DataScience boot camp training helped me understand what Data Science is all about. Gave me good insights into how some data science concepts can be implemented in various fields/areas. I… Read more “Ashwin Athreya Vankayala”

Ashwin Athreya Vankayala
Software Engineer II at Microsoft
Avatar 300

Michael Todd

Data Science Dojo’s balance of theory with practical application is the best I’ve seen. You’ll gain an appreciation for the mathematics, without feeling overwhelmed by it, then be immediately ready… Read more “Michael Todd”

Michael Todd
Principle Software Architect at Microsoft
Vishal Dugar

Vishal Dugar

Loved the bootcamp! It got me really excited about my new role in Data Sciences.

Vishal Dugar
Sr. Software Engineer at Microsoft
Chen Ku

Chen Ku

What I learned from DataScienceDojo’s 5-Day bootcamp is beyond my expectation.  The way they structure some key areas will help us learn, think and apply to real world in the… Read more “Chen Ku”

Chen Ku
Sr. Software Design Engineer in Test at Microsoft Corporation
Avatar 300

Ravikumar Kona

Seattle Boot camp was awesome and the instructors were extremely knowledgeable and I learned a lot from this boot camp and would like to recommend it to my coworkers and… Read more “Ravikumar Kona”

Ravikumar Kona
Microsoft
Yue Tu v2

Yue Tu

It was intense, good instruction, at the right level for beginners.

Yue Tu
Senior Manager, Energy & Sustainability at Microsoft
Obula Basireddy

Obula Basireddy

This is great course. I have got good knowledge and hands on experience for machine learning and Big Data. It gave me many insights on what is machine learning and… Read more “Obula Basireddy”

Obula Basireddy
Senior Site Reliability Engineer at Microsoft
Ali Khaki

Ali Khaki

Great introduction and overview of DS and ML, combining both theory and practice leaving me confident and excited to explore the subject thoroughly with more confidence in the future.

Ali Khaki
Principal PM Manager / Director: Order Mgmt, Fulfillment and Logistics at Supply Chain Engineering at Microsoft
Lesha Bhansali

Lesha Bhansali

It was a great 5 day workshop with getting some hands on experience and understanding the roots of data science. It made me work towards how data can be applied… Read more “Lesha Bhansali”

Lesha Bhansali
Program Manager at Microsoft
Sharoon Srivastava

Sharoon Srivastava

Absolutely amazing bootcamp! Raja really helps you learn and grasp things really quickly no matter how intensive the material is. Data science is truly for everyone.

Sharoon Srivastava
Software Engineer at Microsoft
sravya potluri

Sravya Potluri

Great bootcamp and amazing learning experience. Thanks for Raja and his passionate teaching for making me a different me. I can think differently at a data science problem and approach… Read more “Sravya Potluri”

Sravya Potluri
Project Manager at Microsoft
Saritha VETSA

Saritha Vetsa

The bootcamp is awesome. It gives a strong foundation skills, to start our journey in data science. The way it is designed is great. I recommend it to others.

Saritha Vetsa
Software Engineer II at Microsoft
Miguel Uribe

Miguel Uribe

It was a great experience for increasing the expertise on data science. The abstract concepts were explained well and always focused on real applications and business cases. The pace was… Read more “Miguel Uribe”

Miguel Uribe
Principal PM Manager at Microsoft
Revanth Chandra Pydimarri

Revanth Chandra Pydimarri

BEST BOOT CAMP EVER!!!

Revanth Chandra Pydimarri
Program Manager 2 at Microsoft
ModuleLessonTopicsDescriptionTimelineSampleSample Video
Data Science FundamentalsImportance of 'Data' in Data ScienceSampling. Quantity, quality, and variety of data. Privacy, access control, legal, ethical and security issues in data acquisition.Beginners in data science often put too much emphasis on machine learning algorithms while ignoring the fact that garbage data will only produce garbage insights. Data quality is one of the most overlooked issues in data science. We discuss challenges and best practices in data acquisition, processing, transformation, cleaning and loading.During BootcampInteractive discussionhttps://datasciencedojo.com/wp-content/uploads/data_exploration_visualization_slide_sample.pdf1 hour
Data Science FundamentalsData Exploration and VisualizationVarious data visualization and exploration techniques and packages. Interpreting boxplots, histograms, density plots, scatterplots and more. Segmentation and Simpson's paradox. Through a series of hands-on exercises and a lot of interactive discussions, we will learn how to dissect and explore data. We take different datasets and discuss the best way to explore and visualize data. We form hypothesis and discuss the validity of our hypothesis by using various data exploration and visualization techniques.During BootcampInteractice discussion. R. Pythonhttps://datasciencedojo.com/wp-content/uploads/data_exploration_visualization_slide_sample.pdf3 hours
Data Science FundamentalsFeature EngineeringCalculating features from numeric features. Binning, grouping, quantizing, ratios and mathematical transforms for features in different applicationsFeature engineering is one of the most important aspects of building machine learning models. We will practice engineering new features, clean data before reporting or modeling. During BootcampInteractice discussion. R. Pythonhttps://datasciencedojo.com/wp-content/uploads/data_exploration_visualization_slide_sample.pdf1
Data Science FundamentalsStorytelling with DataCommunicating actionable insights. Various possible interpretations of plots. Storytelling with data. Bias in data acquistion, transformation, cleaning, modeling and interpretationExperienced data professionals will tell you that storytelling is one of the most important skills for communicating insights. We will practice the skill of storytelling while presenting analysis.During BootcampInteractice discussion. R. Pythonhttps://datasciencedojo.com/wp-content/uploads/data_exploration_visualization_slide_sample.pdf1
Predictive AnalyticsModeling a Real World Predictive Analytics ProblemFace detection. Adversarial machine learning. Spam detection. Translating a real world problem to a machine learning problemTaking a real world business problem and translating it into a machine learning problem takes a lot of practice. We will take some common applications of predictive analytics around us and discuss the process of turning that into a predictive analytics problem.During BootcampInteractive discussionhttps://datasciencedojo.com/wp-content/uploads/predictive_classification_decision_slide_sample.pdf1 hour
Predictive AnalyticsSupervised Learning and ClassificationSupervised learning vs. Unsupervised learning. Features, predictors, labels, target values. Training, testing, evaluation.Supervised learning is about learning from historical data. We will understand some of the key assumptions in predictive modeling. We will discuss in what scenarios the distribution of future data will not remain the same as the historical data.During BootcampInteractive discussionNANA2 hours
Predictive AnalyticsDecision Tree ClassificationDecision tree learning. Impurity measures: Entropy and Gini index. Varying decision tree complexity by varying model parameters.We will start learning building predictive models by understanding decision tree classification in depth. We will start with an understanding of how we split nodes in a decision tree, impurity measures like entropy and Gini index. We will also understand the idea of varying the complexity of a decision tree by change decision tree parameters such as maximum depth, number of observations on the leaf node, complexity parameter etc.During BootcampInteractive discussion. R. PythonNANA2 hours
Predictive AnalyticsBuilding and evaluating a classification modelTrain/test split. Training, prediction and evaluation. Varying model hyperparameters such as maximum depth, number of observations on leaf node, minimum number of observations for splitting etc.We will build a classification model using decision tree learning. We will learn how to create train/test datasets, train the model, evaluate the model and vary model hyperparameters.During BootcampR. PythonNANA1 hour
Model Evaluation and SelectionEvaluation Metrics for Classification ModelsConfusion matrix, false/true positives and false/true negatives. Accuracy, pecision, recall, F1-score. ROC curve and area under the curve. Once we have understood how to build a predictive model, we will discuss the importance of defining the correct evaluation metrics. We will discuss real-world anecdotes to discuss under what circumstances one metric might be a better metric than the other. During BootcampInteractive discussionNANA2 hours
Model Evaluation and SelectionGeneralization and OverfittingGeneralization. Overfitting. Bias and variance. Repeatability. Bootstrap samplingBuilding a model that generalizes well requires a solid understanding of the fundamentals. We will understand what do we mean by generalization and overfitting. We will also discuss the ideas of bias and variance and how the complexity of a model can impact the bias and variance of our model.During BootcampInteractive discussionNANA2 hours
Model Evaluation and SelectionTuning of Model HyperparametersModel complexity. Bias and variance. K-fold cross validation. Leave one out cross validation. Time series cross validation.How do we build a model that generalizes well and is not overfit? The answer is by adjusting the complexity of machine learning model to the right level. This process known as hyperparameter tuning is one of the most important skills you will learn at the bootcamp. Using the decision tree learning parameters as an example we will observe how a model is impacted by creating a deeper or a shallow tree. We will do practical hyperparameter tuning exercises using cross validation.During BootcampAzure ML, R, PythonNANA1 hour
Ensemble MethodsBaggingBinomial distribution. Review of bias/variance, overfitting and generalization. Sampling with/without replacement. Bootsrtaped samplingMathematical understanding of concepts is easier when we start with developing an intuition for the (may be not so) complex math behind an apparently complex topic. Having built a solid understanding of the concepts of bias, variance and generalization, we explain why building a committee of models improves generalization. We also review math topics such as bootstrap sampling and binomial distribution that are key to understanding why ensembles work so well.During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/ensemble_random_forest_slide_sample.pdf5 hours
Ensemble MethodsRandom ForestQuick review of decision tree splits. Column randomization trick and why it is helpful in building more generalized models.Having understood bagging very well, we segue the discussion into the idea of feature/column randomization. We explain how feature randomization helps overcome the greediness of decision tree learning and make a case of Random Forest. During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/ensemble_random_forest_slide_sample.pdf5 hours
Ensemble MethodsRandom Forest Hyperparameter TuningTuning parameters like depth, number of trees, number of random features selected etc. Using R/Python libraries and Azure ML Studio to tune a model.Hands-on exercise to select the appropriate number of trees, number of random features and other tuning parameters in a Random Forest and variants of the technique.During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/ensemble_random_forest_slide_sample.pdf5 hours
BoostingBoosting IntroductionStrength of weak learners. Boosting intuition. Altering a sampling distributionBoosting is an immensely powerful and understandably popular technique. We discuss the fundamental ideas behind boosting. We also get an intuitive understanding of how one can alter the sampling distribution while sampling for each round of boosting.During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/ensemble_random_forest_slide_sample.pdf5 hours
BoostingMechanics of Boosting and PiftfallsAdaBoost. Update of weights of training data points and models in the ensemble. Penalty function. Strength and weaknesses of boosting.Armed with an intuitive understanding of boosting, we pick AdaBoost as an example. We explain the mechanics of AdaBoost, weight update for training data, altering the sampling distribution and weight update for the models in an ensemble. We also discuss the strength and weaknesses of boosting and potential pitfalls of boostingDuring BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/ensemble_random_forest_slide_sample.pdf5 hours
Dealing with Unstructured DataIntroduction to Text AnalyticsStructured versus semi-structured versus unstructured data, Structuring raw text, Tokenization, Stemming and lemmatization, Stop word removal, Treating punctuation, casing, and numbers in text, Creating a terms dictionary, Drawbacks of simple word frequency counts, Term frequency – inverse document frequency, Document similarity measureNot always will you work with fully structured data. Many applications of data science require analysis of unstructured data such as text. We will teach you the basics of converting text into structured data, and how to model documents to find their similarities and recommend similar documents. We cover the important steps in pre-processing text in order to create textual features and prepare text for modeling or analysis. This includes stemming and lemmatization, treating punctuation and other textual components, stop word removal, and more. We also demonstrate how to model documents using term frequency-inverse document frequency and finding similar documents. The hands-on exercise looks at an example of analyzing text and introduces additional problems to solve in pre-processing text/documents. During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/text_analytics_slide_sample.pdf2 hours
Unsupervised LearningUnsupervised Learning and k-means ClusteringReal-world problems that unsupervised learning algorithms solve, The K-means clustering algorithm, Euclidean distance measure, Defining k, The Elbow Method, Strengths and limitations of k-means clusteringUnsupervised learning at its core is about revealing the hidden structure of any dataset. Not always are you going to be working with labeled data or records tagged with a label outcome. For example, collecting data on customer’s purchasing habits does not come with a label outcome of ‘high value customer’ or ‘low value customer’; that label needs to be created. We teach the underpinnings of the k-means clustering algorithm to solve this problem of finding the common attributes that separate out one cluster group from another. We can then use this to categorize our data based on clusters, or customers of similar attributes such as high value customers who all have similar spending habits. You will also learn how to approach an unsupervised learning challenge through a hands-on exercise and how to define your cluster groups.During BootcampR, Pythonhttps://datasciencedojo.com/wp-content/uploads/unsupervized_slide_sample.pdf2 hours
Ranking and Recommender SystemsCollaborative and Content-based RecommendationsCollaborative versus content recommenders, Data structure of collaborative versus content-based recommnders. Building user profiles and item profiles.Recommder systems are all around us here. We discuss the collaborative and content-based recommenders at high-level. We also discuss how are items recommended in each case. Various strategies for building item and user profiles are also discussed.During BootcampAzure MLhttps://datasciencedojo.com/wp-content/uploads/recommender_sys_slide_sample.pdf3 hours
Ranking and Recommender SystemsMeasures of SimilarityPearson's correlation. Cosine similarity. N nearest neighbors. Weighted and centered metrics.Both collaborative and content-based recommenders rely on similarity but how do we find similarity between vectors. We discuss some approaches to measure similarity and when to use which similarity measure.During BootcampAzure MLhttps://datasciencedojo.com/wp-content/uploads/recommender_sys_slide_sample.pdf3 hours
Ranking and Recommender SystemsEvaluation Metrics for Recommender SystemsMean absolute error, Root mean square error. Discounted Cumulative Gain (DCG) and nDCG for ranking evaluationWe discuss the different scenarios a recommender system may be used. We discuss the difference between a ranking problem and a regression problem and discuss which metrics would be the right metrics for a given problem.During BootcampAzure MLhttps://datasciencedojo.com/wp-content/uploads/recommender_sys_slide_sample.pdf3 hours
Design of Experiments and Online ExperimentationOnline ExperimentationA/B Testing. Multivariate tests. Some interesting online experiments that defy intuition. Online vs. offline metrics.Design of experiments, hypothesis testing is one of the most useful tools in data science. We kick off our discussion with a discussion on why online experimentation is needed in the first place. We also discuss the difference between online and offline metrics. We will have a group activity to discuss the hypothetical 'Facebook', 'Amazon', and 'Google' examples of online metrics.During BootcampR, Pythonhttps://datasciencedojo.com/wp-content/uploads/ab_testing_slide_sample.pdf2 hours
Design of Experiments and Online ExperimentationHypothesis Testing FundamentalsControl, treatment and hypothesis testing. Type I, Type II error and interactions. Confidence interval and p-values. Z-table and t-table. Desiging and running experiments depends upon a good understanding of hypothesis testing fundamentals. We offer a quick overview to hypothesis testing with all the necessary concepts. We take a practical example and calculate confidence intervals with varying confidence values assuming a small and big sample size. We explain the fundamental in an intuitive manner without being too involved in the mathematical details.During BootcampR, Pythonhttps://datasciencedojo.com/wp-content/uploads/ab_testing_slide_sample.pdf2 hours
Design of Experiments and Online ExperimentationRunning experiments in real-worldSteps in online experimentation: Choosing treatment, control and factors. Sample size selection. Effect size. A/A tests. Logging and instrumentation. Segmentation and interpretation.Running online experiments in real-world is both a science and an art. We discuss the various steps in an experiment and emphasize the importance of each step. We also discuss the potential pitfalls in an online experimentation pipeline.During BootcampR, Pythonhttps://datasciencedojo.com/wp-content/uploads/ab_testing_slide_sample.pdf2 hours
Linear Models for RegressionMath FundamentalsIntroduction. Derivatives and gradients. Minima/maxima. Covexity functions and why convexity matters.Before talking about linear models, we setup the mathematical foundations of regression models. We start with a discussion of some calculus fundamentals to be able to transition seemlessly into the math behind finding the minimum of the cost function eventually.During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/regression_slide_sample.pdf4 hours
Linear Models for RegressionOptimizing the Cost FunctionGradient descent. Batch gradient descent. Stochastic gradient descent. Mini-batch gradient descent. Global vs. local minima.With the mathematical background already setup, we intuitively understand what should be the cost function for a linear regression model. We frame our cost function and discuss how gradient descent finds the minimum of the cost function. We also emphasize on the fact that the particular choice of cost function makes it a convex optimization problem and eliminates the risk of a local minima for us. We campare the batch, stochastic and mini-batch approaches to minimization of cost function.During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/regression_slide_sample.pdf4 hours
Linear Models for RegressionEvaluation of Regression ModelsMean absolute error, Root mean square error, R-squared and adjusted R-squared measure.We discuss the different evaluation metrics for a regression model and in what scenarios each of them might be a good choice.During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/regression_slide_sample.pdf4 hours
Linear Models for RegressionRegularizationRegularization intuition. L1 penalty and LASSO. L2 penalty and Ridge regressionModern compute resources incentivize overfitting and even practitioners fall for it. We discuss the intuition behind regularization and the penalty parameter. We discuss the L1 and L2 penalty and ridge regression and give a quick overview of LASSO and Ridge regression.During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/regression_slide_sample.pdf4 hours
Linear Models for RegressionPredicting prices of real-estate/housing pricing using a linear regression modelLinear regression model. Adjusting the regularization penalty and number of rounds to get a better model and improve the estimate (MAE and standard deviation).We will build a linear regression model to build a real-estate price predictor. We will see how adjusting the regularization penalty and number of rounds of parameter update can result in a substanial improvement in both the Mean Absolute Error and standard deviating on a 10-fold cross validation.During BootcampR, Python, Azure MLhttps://datasciencedojo.com/wp-content/uploads/regression_slide_sample.pdf4 hours
Data EngineeringBig Data EngineeringDistributed computing and cloud infrastructure, Hadoop, Hadoop Distributed File System, MapReduce, Hive, Mahout, SparkThe first challenge of big data isn’t one of analysis, but rather of volume and velocity. How do you process terabytes of data in a reliable, relatively rapid way? We teach you the basics of MapReduce and Hadoop Distributed File System, the technologies which underly Hadoop, the most popular distributed computing platform. We also introduce you to Hive, Mahout and Spark, the next wave of distributed analysis platforms. Learn how distributed computing works to be able to scale machine learning training on terabytes of data. The hands-on lab will take you through the process step-by-step on setting up a Hadoop cluster to handle processing big data.During BootcampAzurehttps://datasciencedojo.com/wp-content/uploads/big_data_engineering_slide_sample.pdf3 hours
Data EngineeringReal-time/IoTExtract, transform, and load pipelines, Data ingestion, Event brokers, Stream storage, Azure Event Hub, Stream Processing, Event processors, Access rights and access policies, Querying streaming data and analysisOften the data that we are working with is not sitting in a database or files, it is being continuously streamed from a source. Network systems, sensor devices, 24-hour monitoring devices, and the like, are constantly streaming and recording data. Learn how to handle the end-to-end process of handling these data, from extracting the data, to processing it, to filtering out important data and analyzing the data on the fly, near real-time. We take you through building your own end-to-end ETL (extract, transform, load) pipeline in the cloud. You will stream data from a source such as Twitter, or credit card transactions, or a smartphone to an event ingestor. This processes the data and writes it out to cloud storage. You will then be able to read the data into Azure for analysis and processing.During BootcampAzure ML, Azure Stream Analyticshttps://datasciencedojo.com/wp-content/uploads/real_time_slide_sample.pdf4 hours
Data EngineeringDeploying a Predictive Model as a ServiceREST Endpoints, APIsA user-interface into a model makes it easier to see how it would work in the real world, where a new customer enters the systems and data is collected on their age, gender, and so on. We teach you direct and simple processes for setting up real-time prediction endpoints in the cloud, allowing you to access your trained model from anywhere in the world. We walk you through constructing your own endpoints and show a few practical demos of how this can be used to expose a predictive model to anyone you’d like to use it and see how it takes new data and makes a prediction.During BootcampAzure MLNA1 hour
Bootcamp PreparationIntroduction to Big Data, Data Science and Predictive AnalyticsBig Data, ETL Pipelines, Data Mining, Predictive AnalyticsWe introduce you to the wide world of Big Data, throwing back the curtain on the diversity and ubiquity of data science in the modern world. We also give you a bird's eye view of the subfields of predictive analytics and the pieces of a big data pipeline.Pre-BootcampNonehttps://datasciencedojo.com/wp-content/uploads/2016/03/Introduction-to-Big-Data-Predictive-Analytics-and-Data-Science-sample.pdf2 hours
Bootcamp PreparationFundamentals of Data MiningDataset types, Data preprocessing, Similarity, Data explorationAll great learning opportunities are built on a solid foundation. This session is jam-packed with all the background information, technical terminology, and basic knowledge that you will need to hit the ground running on the first day of the bootcamp.Pre-BootcampNonehttps://datasciencedojo.com/wp-content/uploads/2016/03/Data-Mining-Fundamentals-sample.pdf1.5 hours
Bootcamp PreparationIntroduction to R ProgrammingR basics, R data types, R language features, R visualizationHere we introduce the basics of the R programming language. R is a free, open-source statistical programming platform. It is designed to make many of the most common data processing tasks as simple as possible. With this knowledge, you'll be able to engage fully with the hands-on exercises in the class.Pre-BootcampRNA2 hours
Bootcamp PreparationIntroduction to Azure Machine LearningAzure ML basics, Azure ML preprocessing, Azure ML visualizationAzure Machine Learning Studio is a fully featured graphical data science tool in the cloud. You will learn how to upload, analyze, visualize, manipulate, and clean data using the clean and intuitive interface of Azure MLPre-BootcampAzure MLNA1.5 hours
Continued LearningKaggle CapstoneFeature Engineering, Model Training, Model Evaluation, Model TuningYou will apply your learning, knowledge and skills of data science throughout each day of the bootcamp. We coach you throughout the week to put those new skills to the test with a real problem. Kaggle's Titanic survival prediction competition is the perfect testing ground to cut your teeth on. You'll compete against your fellow students, with the top 2-3 contenders receiving a special prize.Post-BootcampR, Python, Azure MLNANA5 days
Continued LearningNaive BayesConditional Probability, Bayes' Rule, Independence, Naive BayesNaive Bayes is one of the most popular and widely used classfication algorithms, particularly in text analysis. It is also a simple, fast, and small algorithm suitable for use on datasets of any size. We teach you how Naive Bayes works, why it works, and when it is likely to break down.Post-BootcampR, Pythonhttps://datasciencedojo.com/wp-content/uploads/2016/03/Naive-Bayes-sample.pdf1 hour
Continued LearningLogistic RegressionCost Functions, Logit Function, Decision BoundariesLogistic Regression is one of the oldest and best understood classification algorithms. While not suitable for every application, it is fast to run and cheap to store. We will teach you how logistic regression fits a dataset to make predictions, as well as when and why to use it.Post-BootcampR, Python, Amazon MLhttps://datasciencedojo.com/wp-content/uploads/2016/03/Logistic-Regression-sample.pdf1 hour
Continued LearningIntroduction to NoSQL DatabasesCAP theorem, NoSQL, HBaseWith the massive increase in velocity and volume of data, even the largest and fastest SQL database lags under the load of millions of requests per second. We teach you how NoSQL databases solve this problem, sacrificing a small amount of consistency for a massive increase in durability.Post-BootcampAzurehttps://datasciencedojo.com/wp-content/uploads/2016/03/Introduction-to-NoSQL-Databases-sample.pdf1 hour
Continued LearningSelf Directed LabsAzure SQL Database, HBase, Hadoop, HDInsight, Azure PowerShell, Mahout, Spark, Live Twitter Sentiment AnalysisThe world of data science and data engineering is larger than we have time to cover in the bootcamp. We want you to be as equipped to tackle this world as possible, so we have written a 350+ page textbook filled with step by step tutorials introducing you to many different tools. You will get a copy of this book at the bootcamp, allowing you to learn this additional information at your own pace.Post-BootcampAzure, Amazon, Hadoop, Sparkhttps://datasciencedojo.com/wp-content/uploads/2016/03/Self-Directed-Labs-sample.pdf2 - 4 weeks
Continued LearningLive Practice WebinarsNumerous data science topics from Time Series Forecasting, to Churn Prediction, to Resume Preparation, and more.Your learning does not stop after the bootcamp. You’ll be able to tune into a live webinar and keep practicing your skills with a walk-through example or exercise on a new topic every two weeks. Master your art and strengthen your skills with regular practice. The webinars will also be recorded to view at a more convenient time.Post-BootcampR, PythonNA1-1.5 hours every 2 weeks

Our Data Science Bootcamp In The Press

Best Data Science Bootcamp – SwitchUp (2020)

Learn Data Science at These 20 Bootcamps – Course Report (2020)

17 best bootcamps for boosting your career – CIO (2020)

Where some of our 5000+ alumni work

More than 1500+ companies & 5,000+ attendees.
data science curriculum Facebook Logodata science curriculum Google Logodata science curriculum Microsoft Logodata science curriculum Youtube Logodata science curriculum IBM Logodata science curriculum Intel Logodata science curriculum Amazon Logodata science curriculum Apple Logo