fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

bootcamp

Data Science Dojo
Angela Baltes
| March 17

Angela Baltes completed Data Science Dojo’s bootcamp program at the University of New Mexico. Here’s her reflection on the course.

The opportunity to participate in Data Science Dojo’s: A Hands-on Introduction to Data Science bootcamp was a simple decision, as I have been a consumer of bootcamps for several years and have found my success varies with them.

In my prior self-paced learning, I found that there were concepts that I simply did not understand well, or perhaps were not explicitly stated in whatever course I was taking.

I wanted to experience an immersive in-person bootcamp with the hopes that practical examples and in-person interactions would be helpful in understanding and retaining the material. Not to mention, I was able to network with others who are interested in this field.

Data Science  is taught by Raja Iqbal, CEO and Chief Data Scientist. He is a talented presenter, and I appreciated his style of teaching the material. He was accompanied by Arham Akheel, who assisted Raja in helping students and provided us with machine learning demonstrations.

This combination was very complimentary to one another and worked well. Please check out Data Science Dojo’s website and check their schedule for they may be coming to a city near you!

5-day Data Science Bootcamp

The bootcamp was offered in Albuquerque, New Mexico for 3 days instead of the prior 5-day bootcamp. From what I understand, we were the first cohort to try this format.

Day 1

On this first day, we spent some time looking into data exploration, and how to approach data problems. We discussed things as a group, and I enjoyed the energy from class.

We discussed that a model is only as good as the data provided to it-garbage in, garbage out. Data is the new oil and is the most valuable asset a company can have, however, we, as data scientists, need to tap into that resource by refining it and getting the most value from it.

One thing that I have personally struggled with is that this course was extremely helpful for learning how to ask the right questions and evaluate business impact. It is our job to ask questions.

Many times, in the past, I was given a task, and I simply began to hammer away without questions asked. In data science, feature engineering and data exploration are the most important tasks, as these activities help to further define and evaluate if this is a worthy endeavor for a company.

Day 2

On this day, we began to delve into machine learning algorithms, more specifically, supervised learning. I found this valuable, as I myself, have the most experience with and understanding of supervised learning. We stressed again before building a model to ask, “What is the intended use of this model?”  as that would be pertinent information in determining what features and format to provide the model to the stakeholders’ that will use it.

We analyzed the Titanic dataset in detail and discussed what features to include in our decision tree model. We also discussed entropy, stopping criteria, and splitting. Our homework assignment was to submit our Titanic model to our leaderboard. I did not place very high, lol.

Day 3

On the last day of the bootcamp, we discussed the pitfalls in machine learning, such as overfitting and underfitting, and understood the bias/variance tradeoff. I have read about this topic to the point of nausea in other settings, but this truly helped me to understand it. Seeing practical examples helped me put this in context.

What was interesting and new to me was discussing how to properly evaluate a model, as it is not always about the accuracy-sometimes (depending upon the problem and domain), it is about the precision or recall! We then spent a great deal of time on hyperparameter tuning, and then how to deploy our machine learning model as a web service, which was way too cool.

What I’ve Learned

I did not completely understand how to tune hyperparameters and how to properly evaluate the performance of a model before the bootcamp. Now I understand why this is necessary and how to carry out this task. We bridged the gap between data science and business value in this course, and that was the foundation going forward.

What I learned is that it is not always about the accuracy of a model, and to align the business needs with precision or recall depending upon the domain and problem one is looking to solve.

I have learned why it is important for the data scientist to ask questions, and not just questions in general, but the right questions, and how the most important tasks before building a model are data exploration, data discovery, and feature engineering. We need to understand the business impact and how this model will add value.

For me, this was paramount. Too many times do we focus on wanting cool models to say we are involved in machine learning rather than focusing on the business need.

I have learned how to use Microsoft tools to build and deploy a model as a web service. I found the ease and simplicity of this to be amazing and something I would like to continue to explore.

The Pros

  • The in-person class setting was helpful in order to understand and connect to the topics at hand. For those who have taken online bootcamps with varying success, you may also appreciate being able to interact with the instructor and other students.
  • The breadth of material covered was impressive. I appreciated that we covered the most important topics in machine learning and addressed common mistakes. We dedicated some of the day to hyperparameter tuning when a model is not performing optimally.
  • We addressed the proper mindset to have for data analysis. How to ask the right questions, and not be afraid to ask questions!
  • Raja and Arham have great chemistry as team members and are fantastic instructors.

The Cons

  • The condensed format was rather overwhelming. This material isn’t truly suited for a 3-day setting. We really only scratched the surface. This cannot be truly helped, but it was worth mentioning.
  • This course is not for those who are new to programming and/or data science. Although we did use Microsoft Azure for machine learning, there is an assumption that the student has some familiarity with programming and data science concepts. You will likely get more out of this course if you have some prior knowledge.

Conclusion

I highly recommend this bootcamp for those who would like to increase their knowledge in data science. This experience was valuable for me so that I can bridge the gap between theory and implementation. From this point on, more learning will be required, but this gave me the boost in the right direction. Cheers!

cheers
Cheers to Data Science Dojo

This review was originally published on Angela Baltes’ personal blog.

Related Topics

Statistics
Resources
Programming
Machine Learning
LLM
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Career
Artificial Intelligence