Duration 10 Weeks
10 weeks of live, instructor-led data science bootcamp, taken fully online
What You’ll Learn
Understanding the Data Science Landscape
This session will get the audience excited about data science by discussing some of the day-today applications of data science. We will discuss how leading companies are leveraging data science effectively. We will get into the end-to-end architecture of a data science pipeline. We will close with some of the common data mining tasks and challenges with doing data science at scale.
Duration: 60-90 minutes
How are the trailblazing technology companies using data science and machine learning
Architecture of a big data pipeline
Quick overview of classification, regression, clustering, anomaly detection, association analysis, and predictive modeling. Business applications for each.
Challenges in doing data science with real data at scale
Understanding the Importance of ‘Data’ in Data Science
There is a lot of hype around machine learning techniques such as deep learning. A good machine learning model, however, starts with good data. We will discuss best practices of handling data before we even start building a machine learning model.
Duration: 60-90 minutes
- Size, quality, variety of data is more important than the machine learning technique used
- What data to gather and use from your users, processes, products, events. Some examples of behavioral, demographic, longitudinal, incidents, monetization, session, pageview, click, graphical data.
- Some interesting predictive modeling examples in retail, healthcare, online services, finance, banking, gambling, dating websites, social networks, law enforcement. Discussing what data would be needed.
- Brainstorming on data acquisition, processing and feature engineering for some of actual business scenarios.
Measure Twice, Cut Once. The art of extracting actionable insights
Metrics are the lifeblood for a data-driven organization. Given all the hype around building a data driven culture, most organizations define too many irrelevant metrics that are not actionable. We will discuss fundamentals of evaluation and measurement in both online and offline setting. We will also discuss common pitfalls in evaluation metrics and how they can be avoided.
Duration: 120-150 minutes
- Why are metrics important? What can go wrong with metrics?
- Bias/variance trade-off, cross validation, blind holdout dataset and overfitting
- Setting up an evaluation and metrics pipeline
- Types of metrics
- Short, medium and long-term metrics
- Vanity vs. actionable metrics
- Leading vs. lagging metrics
- Conflicting metrics
- Offline vs. Online Metrics
- Offline evaluation metrics for classification, regression, and recommender systems
- Online experimentation and A/B testing. The art of interpreting metrics. Online metrics for evaluation
Fundamentals of Data Mining
This session introduces some introductory data mining concepts to absolute beginners. These concepts come in handy when we discuss specific tools and techniques.
Duration: 120-180 minutes
Terminology: Features, variables, labels, target values, predictors, supervised learning, unsupervised learning
Data: Sampling, data types, data quality, noise, outliers, missing values, duplicates,preprocessing, aggregation, dimensionality reduction, feature selection, similarity and dissimilarity metrics
Data exploration and visualization: Summary statistics, percentiles, histograms, boxplots, and scatter plots
A Quick Introduction to Common Machine Learning Techniques
This a quick overview of some common machine learning techniques. Attendees will gain an understanding of what business problems fall in the respective categories and what are some popular machine learning techniques to solve these business problems.
Duration: 240 minutes
Classification: What is classification. Classification using Decision Tree learning.
Regression: Generalized linear models for regression
Unstructured data and unsupervised learning: Clustering using K-Means, text analytics
Ensemble methods: Bagging, boosting, random forests
Ranking: Recommender systems
AI and Deep Learning: Quick overview
Building Data Science Products? Think Business First!
Modern machine learning libraries are both a blessing and a curse. Due to the ease with which the libraries can be used, most users (newbies and practitioners alike) focus too much on tools and techniques. We will discuss the high-level thinking process of coming up with a machine learning algorithm by asking a business a business question before even thinking about the tools or technologies.
Duration: 90-120 minutes
- Why starting with technology is the wrong approach? Ask a business question and work your way backward
- Choosing the right machine learning algorithm for your business problem
- High-level thinking process in conceiving, implementing, deploying and maintaining a machine learning system
Ethical Dimensions of Data Science. Data Science without the Creep
Data science is an emerging discipline; The wide – and easy – availability of data from a variety of sources has raised all sorts of ethics questions. We will discuss these questions and the ways of doing data science ethically.
Duration: 60 minutes
- Ethical dilemmas in data science and some real examples from industry.
- Security, privacy, PII. Is it possible to extract actionable insights while being ethical?
- Discussing some common ethical and regulatory issues faced by industry.
Understanding Different Pieces of the Big Data Puzzle
The available options (IaaS, PaaS, SaaS) available for big data and data science can be mindnumbing. This session will demystify these options to an average user and discuss how to choose among these options in an informed manner.
Duration: 60 minutes
- Understanding the jargon. Big data, Hadoop, MapReduce, Spark, IoTs, and Real-time Analytics
- Understanding the Apache Hadoop project and how various vendors such as Microsoft Azure, Amazon Web Services, Cloudera, MapR, Horton Works etc. are different.