Evaluating Large Language Models


Large Language Models are powering generative AI applications, yet properly evaluating them is quite challenging. LLM-based applications suffer from issues of hallucinations, toxicity, out-of-context awareness, prompt injection, data leaks, and more. Evaluating the usefulness of LLM applications is essential before putting them in production. In this live session, we will discuss LLM and generative AI evaluation techniques, including the setup, metrics, data, and more. We will also discuss tools that can automate LLM evaluations for RAG and non-RAG applications.

Sonali Pattnaik

Lead AI Scientist at Progressive Insurance

Sonali is a Lead Data Scientist at Progressive Inc., one of the largest insurance companies in the USA. Sonali has 6+ years of experience in AI, ML, and data science, having developed and deployed AI models in production in high-risk and regulated industries. Her expertise lies in LLMs and computer vision. Sonali received her MS from the University of Washington and undergrad from IIT Kharagpur, India.

