Whether you are a beginner or just a bit rusty on your Python fundamentals, our carefully design pre-training tutorials will get you ready for live learning. These tutorials will give you a quick review on the fundamentals of Python and Jupyter notebooks. Topics in this module include:
- Variables, expressions, comments, and constants
- Conditional execution
- Functions and libraries
- Loops and iteration
Delivery format: Self-paced learning through online tutorials and provided Jupyter notebooks
NumPy and Pandas
Introduction to NumPy and Pandas
Obtaining, processing and storing data are the necessary early steps in a data pipeline. The purpose of this module is to develop a good understanding of the data structures available in Python for structured and unstructured data in Python. Topics include:
- Pandas and NumPy data structures: Lists, dictionaries, data frames
- Read/write operations from/to different file formats: txt, json, xml, html
- Indexing, slicing and subsetting data frames – selecting rows and columns
- Chaining conditions
Delivery format: Instructor-led, live learning. Attendees will be given access to Jupyter notebooks and many code samples for in-class and homework exercises.
Data Wrangling is often the stage where the most time is spent by a data scientist.
In this module we will talk about the importance of data wrangling in an analytics pipeline, and discuss best practices.
We will look at techniques for data wrangling that include data aggregation, merging, and transformation using the Pandas library. We will also talk about String Manipulation, a key part of any text analytics task.
- Indexing, slicing, and subsetting data.
- Simple data cleaning.
- Data transformation.
- Pandas merging, groupby, and reshaping.
- One-hot encoding and pivot tables.
- String manipulation.
- Data cleaning
- Data transformation
- Data aggregation
Data Exploration and Visualization
Data Exploration and Visualization
Being able to tell a story with your data is a key skill for a data scientist, and a big part of that is being able to make good visualizations.
In this module, we will look at the Seaborn and matplotlib libraries, and use them to build visuals.
We will look at how to build visualizations such as heatmaps, scatter plots, and density curves. We will also look at some real world test scores data and see how we can find insights using visualizations
- Figures, plots, and axes in matplotlib
- Building faceted visuals on the same plot
- Exploratory data analysis
- Using Seaborn to visualize data
- Seaborn code samples
- Plotly code samples
- ggplot code samples
REST APIs and Data Pipelines
Building Data Pipelines
Often when working on real world problems, data pipelines are used to make sure the end to end process works as intended and is scalable.
In this module, we will look at REST APIs, learn how to use them in a Python script, learn about web scraping, and call a deployed machine learning model using a REST API endpoint.
We will look at the requests library in Python, learn about the different types of HTTP requests, and look at the BeautifulSoup library for webscraping. We will also call a model deployed on Azure, manipulate the data we receive, and finally upload it to Azure to complete the data pipeline experience.
- Introduction to REST APIs
- Using the requests library in Python
- API request structure, methods, endpoints
- Web scraping using BeautifulSoup
- Calling a deployed machine learning model using Python
- Building a basic data pipeline in Python
- Calling a deployed unsupervised model
- Calling a deployed regression model
- Face mask detection using OpenCV
- Web scraping with Python and BeautifulSoup
Machine Learning with scikit-learn
Machine Learning with Python
Machine learning is the part of a data scientist’s job that is often the most interesting, because of the unique and interesting tasks that are being done using machine learning.
In this module, we will introduce the concept of machine learning, look at the scikit-learn library and the vast collection of machine learning tools available within scikit-learn.
We will look at a variety of estimators in scikit-learn, including linear regression, random forests. We will also look at a churn prediction model built using scikit-learn, and also look at the concept of grid search for hyperparameter tuning, and how we can evaluate models using techniques such as ROC curves.
- Introduction to scikit-learn
- Estimators and transformers
- Linear regression using scikit-learn
- Building a customer churn prediction model using scikit-learn
- Hyperparameter tuning, grid search and cross validation
- Resume analysis
- Naive Bayes for Spam classification
Once you have gone through the 5 day course, it is time to implement what you have learnt in a project of your choice. In order to solidify your learning, we offer the opportunity for all attendees to think of a project that they would like to pursue using Python, and receive guidance from our team of instructors on how they can make their projects successful. The scope of the project can be from any field of your choosing, a few popular ones include:
- Computer vision: Object detection. PPE compliance. Social distancing detection. Face recognition. Vehicle counting.
- Text mining and NLP: Sentiment analysis. Email spam classifier. Fake news detection. Building a chat bot.
- Sales and marketing: Survival analysis. Customer churn prediction. Customer segmentation.
- Healthcare: COVID-19/Pnuemonia detection from lung scan. Breast cancer detection. Patient readmission rate detection.
- Manufacturing: Product defect detection. Predictive maintenance.
*Mentoring is only available to attendees with Sensei package.