A Hands-on Introduction to Data Science

Learn Data Predictive Analytics, Data Exploration, Visualization, Predictive Modeling, Decision Trees, Kaggle, Azure & much more in 3 days

  • Location: Albuquerque, New Mexico
  • Start Date: 09 March 2020  | End Date: 11 March 2020  | Timings: 9 am – 6 pm
  • Language: English
unm-small-logo

Data Science Certificate

Get a data science certificate backed by an accredited university.
Eligible for special financing options including WIOA.

dojo-switch-up-badge

4.96 / 5

90+ Reviews

Course Description

Data literacy is a crucial but rare skill to have for any modern-day business. Our bootcamp curriculum teaches working professionals how to extract actionable insights from data enabling you to solve real-world problems in the shortest duration possible. A series of live in-person instructor-led tutorials will teach you the fundamentals of data science and equip you with skills in R programming and Azure machine learning tools for data science. Our carefully crafted curriculum provides the right mix of theoretical concepts, hands-on practical exercises, and business interpretation of results.

Who should attend?

Working professionals who want to add data science to their current positions, or who want to learn more about this new field.

Program Requirements

You should have an interest in data science and data engineering as well as knowledge of at least one programming language. However, many of our attendees come to us with little to no programming experience. Our pre-bootcamp materials will get you where you need to be to hit the ground running.

What you will learn

  • Importance of ‘Data’ in Data Science
  • Data Exploration and Visualization
  • Feature Engineering
  • Storytelling with Data
  • Introduction to Predictive Modeling, Classification and Decision Trees
  • Evaluation of Classification Models
  • Deploying a model in Azure machine learning
  • Participation in a Kaggle competition

Curriculum

ModuleLessonTopicsDescriptionTimelineFormatSampleSample Video
Bootcamp PreparationIntroduction to Big Data, Data Science and Predictive AnalyticsBig Data, ETL Pipelines, Data Mining, Predictive AnalyticsWe introduce you to the wide world of Big Data, throwing back the curtain on the diversity and ubiquity of data science in the modern world. We also give you a bird's eye view of the subfields of predictive analytics and the pieces of a big data pipeline.Pre-BootcampNonehttps://datasciencedojo.com/wp-content/uploads/2016/03/Introduction-to-Big-Data-Predictive-Analytics-and-Data-Science-sample.pdf2 hours
Bootcamp PreparationFundamentals of Data MiningDataset types, Data preprocessing, Similarity, Data explorationAll great learning opportunities are built on a solid foundation. This session is jam-packed with all the background information, technical terminology, and basic knowledge that you will need to hit the ground running on the first day of the bootcamp.Pre-BootcampNonehttps://datasciencedojo.com/wp-content/uploads/2016/03/Data-Mining-Fundamentals-sample.pdf1.5 hours
Bootcamp PreparationIntroduction to R ProgrammingR basics, R data types, R language features, R visualizationHere we introduce the basics of the R programming language. R is a free, open-source statistical programming platform. It is designed to make many of the most common data processing tasks as simple as possible. With this knowledge, you'll be able to engage fully with the hands-on exercises in the class.Pre-BootcampR2 hours
Bootcamp PreparationIntroduction to Azure Machine LearningAzure ML basics, Azure ML preprocessing, Azure ML visualizationAzure Machine Learning Studio is a fully featured graphical data science tool in the cloud. You will learn how to upload, analyze, visualize, manipulate, and clean data using the clean and intuitive interface of Azure MLPre-BootcampAzure ML1.5 hours
Data Science FundamentalsStorytelling with DataCommunicating actionable insights. Various possible interpretations of plots. Storytelling with data. Bias in data acquistion, transformation, cleaning, modeling and interpretationExperienced data professionals will tell you that storytelling is one of the most important skills for communicating insights. We will practice the skill of storytelling while presenting analysis.Day 1Interactice discussion. R. Pythonhttps://datasciencedojo.com/wp-content/uploads/data_exploration_visualization_slide_sample.pdf1
Data Science FundamentalsImportance of 'Data' in Data ScienceSampling. Quantity, quality, and variety of data. Privacy, access control, legal, ethical and security issues in data acquisition.Beginners in data science often put too much emphasis on machine learning algorithms while ignoring the fact that garbage data will only produce garbage insights. Data quality is one of the most overlooked issues in data science. We discuss challenges and best practices in data acquisition, processing, transformation, cleaning and loading.Day 1Interactive discussionhttps://datasciencedojo.com/wp-content/uploads/data_exploration_visualization_slide_sample.pdf1 hour
Data Science FundamentalsData Exploration and VisualizationVarious data visualization and exploration techniques and packages. Interpreting boxplots, histograms, density plots, scatterplots and more. Segmentation and Simpson's paradox. Through a series of hands-on exercises and a lot of interactive discussions, we will learn how to dissect and explore data. We take different datasets and discuss the best way to explore and visualize data. We form hypothesis and discuss the validity of our hypothesis by using various data exploration and visualization techniques.Day 1Interactice discussion. R. Pythonhttps://datasciencedojo.com/wp-content/uploads/data_exploration_visualization_slide_sample.pdf3 hours
Data Science FundamentalsFeature EngineeringCalculating features from numeric features. Binning, grouping, quantizing, ratios and mathematical transforms for features in different applicationsFeature engineering is one of the most important aspects of building machine learning models. We will practice engineering new features, clean data before reporting or modeling.Day 1Interactice discussion. R. Pythonhttps://datasciencedojo.com/wp-content/uploads/data_exploration_visualization_slide_sample.pdf1
Predictive AnalyticsModeling a Real World Predictive Analytics ProblemFace detection. Adversarial machine learning. Spam detection. Translating a real world problem to a machine learning problemTaking a real world business problem and translating it into a machine learning problem takes a lot of practice. We will take some common applications of predictive analytics around us and discuss the process of turning that into a predictive analytics problem.Day 1Interactive discussionhttps://datasciencedojo.com/wp-content/uploads/predictive_classification_decision_slide_sample.pdf1 hour
Predictive AnalyticsSupervised Learning and ClassificationSupervised learning vs. Unsupervised learning. Features, predictors, labels, target values. Training, testing, evaluation.Supervised learning is about learning from historical data. We will understand some of the key assumptions in predictive modeling. We will discuss in what scenarios the distribution of future data will not remain the same as the historical data.Day 2Interactive discussionNone2 hours
Predictive AnalyticsDecision Tree ClassificationDecision tree learning. Impurity measures: Entropy and Gini index. Varying decision tree complexity by varying model parameters.We will start learning building predictive models by understanding decision tree classification in depth. We will start with an understanding of how we split nodes in a decision tree, impurity measures like entropy and Gini index. We will also understand the idea of varying the complexity of a decision tree by change decision tree parameters such as maximum depth, number of observations on the leaf node, complexity parameter etc.Day 2Interactive discussion. R. PythonNone2 hours
Predictive AnalyticsBuilding and evaluating a classification modelTrain/test split. Training, prediction and evaluation. Varying model hyperparameters such as maximum depth, number of observations on leaf node, minimum number of observations for splitting etc.We will build a classification model using decision tree learning. We will learn how to create train/test datasets, train the model, evaluate the model and vary model hyperparameters.Day 2R. PythonNone1 hour
Model Evaluation and SelectionEvaluation Metrics for Classification ModelsConfusion matrix, false/true positives and false/true negatives. Accuracy, pecision, recall, F1-score. ROC curve and area under the curve.Once we have understood how to build a predictive model, we will discuss the importance of defining the correct evaluation metrics. We will discuss real-world anecdotes to discuss under what circumstances one metric might be a better metric than the other.Day 3Interactive discussionNone2 hours
Model Evaluation and SelectionGeneralization and OverfittingGeneralization. Overfitting. Bias and variance. Repeatability. Bootstrap samplingBuilding a model that generalizes well requires a solid understanding of the fundamentals. We will understand what do we mean by generalization and overfitting. We will also discuss the ideas of bias and variance and how the complexity of a model can impact the bias and variance of our model.Day 3Interactive discussionNone2 hours
Model Evaluation and SelectionTuning of Model HyperparametersModel complexity. Bias and variance. K-fold cross validation. Leave one out cross validation. Time series cross validation.How do we build a model that generalizes well and is not overfit? The answer is by adjusting the complexity of machine learning model to the right level. This process known as hyperparameter tuning is one of the most important skills you will learn at the bootcamp. Using the decision tree learning parameters as an example we will observe how a model is impacted by creating a deeper or a shallow tree. We will do practical hyperparameter tuning exercises using cross validation.Day 3Azure ML, R, PythonNone1 hour
Continued LearningNaive BayesConditional Probability, Bayes' Rule, Independence, Naive BayesNaive Bayes is one of the most popular and widely used classfication algorithms, particularly in text analysis. It is also a simple, fast, and small algorithm suitable for use on datasets of any size. We teach you how Naive Bayes works, why it works, and when it is likely to break down.Post-BootcampR, Pythonhttps://datasciencedojo.com/wp-content/uploads/2016/03/Naive-Bayes-sample.pdf1 hour
Continued LearningLogistic RegressionCost Functions, Logit Function, Decision BoundariesLogistic Regression is one of the oldest and best understood classification algorithms. While not suitable for every application, it is fast to run and cheap to store. We will teach you how logistic regression fits a dataset to make predictions, as well as when and why to use it.Post-BootcampR, Python, Amazon MLhttps://datasciencedojo.com/wp-content/uploads/2016/03/Logistic-Regression-sample.pdf