Introduction to Python for Data Science
Learn to use Python effectively for data analysis, analytics, machine learning and data engineering
Upcoming Session
Aug 21 – Aug 25
Live instructor-led training: 9am – noon PDT
Live office hours and support daily.
Trusted by Leading Companies
HANDS-ON LEARNING
Hands-On Python for Data Science and Data Engineering
Python fundamentals, data wrangling, exploration, visualization, machine learning, data pipelines, REST APIs and more. The most comprehensive short-duration Python curriculum for data scientists, data engineers, analysts and researchers.
START FROM GROUND UP
Designed for Both Practitioners and Beginners
Before you arrive for the in-person learning experience, our carefully designed self-paced learning modules will help you get up to speed with Python fundamentals.
Beginner in Python or a little rusty with syntax? No problem! Our pre-training learning modules will get you ramped up before the live training.
COMPLETE LEARNING ECOSYSTEM
Instructor-Led Training, Office Hours, Mentoring, and More
- Instructor-led, live training
- Daily, live office hours for homework support
- Hundreds of code samples for practice
- Cloud-based compute tools, inline code editor, and code repositories
- Hundreds of Pandas, NumPy, Seaborn, matplotlib and scikit-learn code samples
- Mentoring for class project
- One-year access to all supplementary learning material
- Verified Certificate from The University of New Mexico Continuing Education
Curriculum
Python Fundamentals
Whether you are a beginner or just a bit rusty on your Python fundamentals, our carefully design pre-training tutorials will get you ready for live learning. These tutorials will give you a quick review on the fundamentals of Python and Jupyter notebooks. Topics in this module include:
- Variables, expressions, comments, and constants
- Conditional execution
- Functions and libraries
- Loops and iteration
Introduction to NumPy and Pandas
Obtaining, processing and storing data are the necessary early steps in a data pipeline. The purpose of this module is to develop a good understanding of the data structures available in Python for structured and unstructured data in Python. Topics include:
- Pandas and NumPy data structures: Lists, dictionaries, data frames
- Read/write operations from/to different file formats: txt, json, xml, html
- Indexing, slicing and subsetting data frames – selecting rows and columns
- Chaining conditions
Delivery format: Instructor-led, live learning. Attendees will be given access to Jupyter notebooks and many code samples for in-class and homework exercises.
Data Wrangling
Data Wrangling is often the stage where the most time is spent by a data scientist.
In this module we will talk about the importance of data wrangling in an analytics pipeline, and discuss best practices.
We will look at techniques for data wrangling that include data aggregation, merging, and transformation using the Pandas library. We will also talk about String Manipulation, a key part of any text analytics task.
Topics:
- Indexing, slicing, and subsetting data.
- Simple data cleaning.
- Data transformation.
- Pandas merging, groupby, and reshaping.
- One-hot encoding and pivot tables.
- String manipulation.
Homework:
- Data cleaning
- Data transformation
- Data aggregation
Data Exploration and Visualization
Being able to tell a story with your data is a key skill for a data scientist, and a big part of that is being able to make good visualizations.
In this module, we will look at the Seaborn and matplotlib libraries, and use them to build visuals.
We will look at how to build visualizations such as heatmaps, scatter plots, and density curves. We will also look at some real world test scores data and see how we can find insights using visualizations
Topics:
- Figures, plots, and axes in matplotlib
- Building faceted visuals on the same plot
- Exploratory data analysis
- Using Seaborn to visualize data
Homework:
- Seaborn code samples
- Plotly code samples
- ggplot code samples
Building Data Pipelines
Often when working on real world problems, data pipelines are used to make sure the end to end process works as intended and is scalable.
In this module, we will look at REST APIs, learn how to use them in a Python script, learn about web scraping, and call a deployed machine learning model using a REST API endpoint.
We will look at the requests library in Python, learn about the different types of HTTP requests, and look at the BeautifulSoup library for webscraping. We will also call a model deployed on Azure, manipulate the data we receive, and finally upload it to Azure to complete the data pipeline experience.
Topics:
- Introduction to REST APIs
- Using the requests library in Python
- API request structure, methods, endpoints
- Web scraping using BeautifulSoup
- Calling a deployed machine learning model using Python
- Building a basic data pipeline in Python
Homework:
- Calling a deployed unsupervised model
- Calling a deployed regression model
- Face mask detection using OpenCV
- Web scraping with Python and BeautifulSoup
Machine Learning with Python
Machine learning is the part of a data scientist’s job that is often the most interesting, because of the unique and interesting tasks that are being done using machine learning.
In this module, we will introduce the concept of machine learning, look at the scikit-learn library and the vast collection of machine learning tools available within scikit-learn.
We will look at a variety of estimators in scikit-learn, including linear regression, random forests. We will also look at a churn prediction model built using scikit-learn, and also look at the concept of grid search for hyperparameter tuning, and how we can evaluate models using techniques such as ROC curves.
Topics:
- Introduction to scikit-learn
- Estimators and transformers
- Linear regression using scikit-learn
- Building a customer churn prediction model using scikit-learn
- Hyperparameter tuning, grid search and cross validation
Homework:
- Resume analysis
- Naive Bayes for Spam classification
Python Project*
Once you have gone through the 5 day course, it is time to implement what you have learnt in a project of your choice. In order to solidify your learning, we offer the opportunity for all attendees to think of a project that they would like to pursue using Python, and receive guidance from our team of instructors on how they can make their projects successful. The scope of the project can be from any field of your choosing, a few popular ones include:
- Computer vision: Object detection. PPE compliance. Social distancing detection. Face recognition. Vehicle counting.
- Text mining and NLP: Sentiment analysis. Email spam classifier. Fake news detection. Building a chat bot.
- Sales and marketing: Survival analysis. Customer churn prediction. Customer segmentation.
- Healthcare: COVID-19/Pnuemonia detection from lung scan. Breast cancer detection. Patient readmission rate detection.
- Manufacturing: Product defect detection. Predictive maintenance.
*Mentoring is only available to attendees with Sensei package.
Earn a Verified Certificate of Completion
In association with
Recommended by Practitioners
At the end of the fifth day I think all of us are at the same place, so that’s the beauty of this program. You could come from any background because we are covering some diverse topics here, and making sure it’s a level playing field and again, going back to to the motto of, hey, this is for everyone. Kapil Pandey, Analytics Manager at Samsung
It was a great experience for increasing the expertise on data science. The abstract concepts were explained well and always focused on real applications and business cases. The pace was adjusted as needed to let everyone follow the topics. Week was intense as there are many topics to cover but schedule was well managed to optimize people attention.Harris Thamby, Manager at Microsoft
What I enjoyed most about the Data Science Dojo bootcamp was the enthusiasm for data science from the instructors.Eldon Prince, Senior Principal Data Scientist at DELL
Highly valuable course condensed into a single week. Enough background is given to allow one to continue their learning and training on their own.Good energy from the instructors. It is clear that they have real industry experience working on problems.Ben Gawiser, Software Engineer at Amazon
I’m really impressed by the quality of the bootcamp, I came with high expectation and Data Science Dojo exceeded it. I highly recommend the bootcamp to anyone interested in Data Science!Marcello Azambuja, Engineering Manager at Uber
With the knowledge I’ve gained from this bootcamp I can further add value to my clients. Data Science Dojo is the only training which provides alot of useful content and now I can confidently make a predictive model in few minutes.Iyinola Abosede-Brown, Senior Technology Consultant at KPMG
Taught by Practitioners
Our instructors are dedicated to helping you steer your career. With years of experience in the field, our instructors are professional data scientists and practitioners. They bring real-world stories and anecdotes to the class, adding immense value to your learning.
Learning Plans and Schedule
Only 1 seat remaining at this price.
40% OFF
Dojo
$599
$999
- Pre-training material
- 15 hours of live instructions
- In-class learning material
- Online Python Jupyter notebooks
40% OFF
Guru
$779
$1299
- Everything in Dojo plan
- Bonus Python Jupyter notebooks during training period
- Learning platform access during training period
- Collaboration tools access during training period
- Recordings of live sessions for later review during training period
-
Verified certificate from The University of New Mexico