An average data scientist (ML Practitioner, AI expert) spends a significant amount of time designing and running experiments (and waiting for them to complete)
This involves one or many of the following:
– trying out various training algorithms
– doing some feature engineering
– changing preprocessing steps to get more homogeneous data
– trying out different types of hyperparameters
– testing data with different datasets
There is a lot that is involved with creating and running experiments, but the only thing that we seem to be equipped to keep track of is the source code of the best performing experiments, and none of the other configuration parameters that actually constitute an experiment.
Because of this, it is quite frequently that we hear phrases like:
“It was working yesterday” – highlighting the commonality in reproducibility of experiment “I don’t remember what the actual scores are, but using feature X didn’t help” – documentation issue “I fixed a bug, but I ran so many previous experiments with that bug”
– code dependency issue
“I am using the same parameters as experiment 4, why is it not working” – reproducibility and documentation issue
In this talk I will go through the typical process that ML practitioners and data scientists follow, taking python and scikit-learn as a use case, and the recurring issues that we are starting to see with these processes.
I will describe the best practices to follow to help document experiments to help reproducibility, and tools and startups that are working on this space to fix the gaping issues that we have for experiment management.
Dr. Rutu Mulkar is the founder of Hunchera, and previously the founder of Ticary Solutions (acquired by Sigmoidal). She received her Ph.D. in Natural Language Processing from USC and has contributed to IBM’s Watson system that defeated humans in Jeopardy!
She is interested in solving problems related to Natural Language Processing, specifically – Topic Modeling, Recommender Systems, Information Extraction, Semantics, and Search to name a few, and to apply them to various domains such as SEO and healthcare.
6:30-7:45: Follow along with the presentation
When you attend this Meetup, you enter an area where photography, audio, and video recording may occur. By attending this event, you consent to photography, audio recording, video recording and its/their release, publication, exhibition, or reproduction to be used for promotional purposes, advertising, inclusion on websites, social media, or any other purpose by Data Science Dojo (DSD) and its affiliates and representatives.
Can’t Make the Meetup? Watch the Live Stream:
North America, Online, Seattle, United States
Dr. Rutu Mulkar