fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

Building Robust Machine Learning Models from Noisy Labeled Data

Agenda

Many machine learning (ML) models operating in big and small enterprises are often trained on noisy, crowdsourced, or user-generated data. As the annotators are not experts in the application domain or data labeling, ML specialists have to take this property into account when training and operating with the model.

The talk is intended for ML engineers and researchers and will show them how to take into account the specifics of crowdsourced annotations when building or improving their own ML systems. We will look at three important issues:• How to properly account for noisy labeled data when training a model• How to take into account the subjective responses of annotators• How to track distribution bias using model monitoring

Ideas for further research and development in respect of building ML solutions powered by crowdsourced data will also be presented.

Attendees will learn:• How to handle noisy training data using such ML methods as CrowdLayer and CoNAL• How to reliably gather subjective opinions of humans for complex cases using pairwise comparisons• How, with the help of crowdsourcing, it is possible to quickly notice a distributional shift in the already deployed model

Dr. Dmitry Ustalov
Dr. Dmitry Ustalov

Head of the Ecosystem Development Unit at Toloka

Dmitry leads research, education, and open-source teams at the Toloka data labeling platform. He is an expert in human-in-the-loop systems and evaluation of machine learning algorithms with a Ph.D. in natural language processing and more than 14 years of experience. He has a strong publication record at scientific conferences, such as ACL, EMNLP, and NeurIPS.

We are looking for passionate people willing to cultivate and inspire the next generation of leaders in tech, business, and data science. If you are one of them get in touch with us!