fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Experiment Management for Machine Learning

Agenda

An average data scientist (ML Practitioner, AI expert) spends a significant amount of time designing and running machine learning experiments (and waiting for them to complete). This involves trying out various training algorithms, doing feature engineering, changing preprocessing steps to get more homogeneous data, trying different types of hyperparameters, and testing data with different datasets.
There is a lot that is involved in creating and running experiments. However, the only thing that we seem to be equipped with, in order to keep track of the performance, is the source code of the best-performing experiments. It is for this reason that we hear the following phrases quite frequently:

“It was working yesterday” – highlighting the commonality in reproducibility of the experiment.

“I don’t remember what the actual scores are but using feature X didn’t help” – documentation issue.

“I fixed a bug but I ran so many previous experiments with that bug” – code dependency issue.

“I am using the same parameters as experiment 4, why is it not working” – reproducibility and documentation issue.

What you’ll learn

  • To follow the process that machine learning practitioners and data scientists follow taking python and scikit-learn as a use case, and the recurring issues that we are starting to see with these processes
  • Best practices to follow to help reproducibility
  • Tools that the startups are working on to fix the gaping issues for machine learning experiment management
Data Science Dojo
Dr. Rutu Mulkar
Rutu Mulkar is the founder of Hunchera, and previously the founder of Ticary Solutions (acquired by Sigmoidal). She received her Ph.D. in Natural Language Processing from USC and has contributed to IBM’s Watson system that defeated humans in Jeopardy! She is interested in solving problems related to Natural Language Processing, specifically – Topic Modeling, Recommender Systems, Information Extraction, Semantics, and Search to name a few, and to apply them to various domains such as SEO and healthcare.

We are looking for passionate people willing to cultivate and inspire the next generation of leaders in tech, business, and data science. If you are one of them get in touch with us!

Resources

Slides on Experiment Management for Machine Learning can be found here.