Master Production-Ready AI Agents: Evaluate & Ship With Confidence

Agentic AI Conference 2025: May Recap & Exciting Look Ahead to September

Data Science Dojo Staff

Evaluating & Shipping Production-Ready AI Agents

Shipping an AI agent is the easy part. Trusting it in production is where most teams struggle. In this session, Lead AI Architect Kwasi Ankomah breaks down the evaluation discipline that separates demos from production-ready AI agents – built live on a real system, run on traces and telemetry.

About This Webinar

Demos are deceptive. An AI agent that performs flawlessly in a controlled environment can quietly break in production the moment a tool errors, a prompt shifts, or a subagent delegates incorrectly. Most teams still ship on intuition – and pay the price.

This session is the finale of our multi-session agentic AI series. In Session 5, we scaled agents using a supervisor-subagent architecture. In this final session, Kwasi Ankomah – Lead AI Architect at SambaNova Systems with 15 years of cross-industry experience – walks you through the end-to-end evaluation discipline that state-of-the-art teams use to build production-ready AI agents, run on traces and telemetry.

What You Will Learn?

Multi-step, non-deterministic systems cannot be unit-tested like a function, and the evaluation mindset has to shift accordingly. This session covers why traditional software tests fail for agents, and what to use instead.

You will walk through the four evaluator types every production-ready AI agent needs – rule-based, LLM-as-a-judge, trajectory, and recovery-from-failure – and learn how to combine them into a scorecard and regression gate that runs automatically in CI on every prompt, model, tool, or architecture change.

The session also covers the state-of-the-art workflow on LangSmith, including datasets, experiments, and trace-based evaluation that surfaces failure-and-retry sequences inside subagents. For teams on the open-source path, Kwasi walks through LangFuse and OpenTelemetry as a full observability alternative. The session closes with online evaluation and pass^k reliability – how to score live traffic and build genuine statistical confidence that your agent performs correctly every time.

Why Agent Evaluation Matters?

Multi-agent systems are non-deterministic by nature. The same input can produce different intermediate steps, different tool calls, and different outputs across runs. That variability is what makes agents powerful – and what makes traditional QA completely inadequate for them.

The gap between a working demo and a reliable system in production is wider than most teams expect. Without a structured evaluation framework, there is no way to know whether a change to a prompt, a model swap, or a new tool integration has quietly degraded performance somewhere in the pipeline.

The teams shipping production-ready AI agents are not guessing. They run structured evaluations across datasets, catch regressions before they reach users, and monitor live traffic with observability tooling. Without this discipline, every deployment is a risk. You can explore related reading on the Data Science Dojo blog for primers on LangGraph, LangSmith, and building robust AI pipelines, and the LangChain blog for further context on multi-agent orchestration patterns.

Who Should Attend?

This webinar is built for practitioners actively building or preparing to deploy production-ready AI agents – AI and ML engineers working with LangGraph, CrewAI, or similar frameworks, data scientists and architects responsible for production LLM systems, and technical leads evaluating agent observability and CI tooling. If you have ever pushed an agent to production and wondered whether it would hold up under real conditions, this session is for you. Prior exposure to supervisor-subagent patterns is helpful but not required.

Kwasi Ankomah is the Lead AI Architect at SambaNova Systems, specializing in deep agent architectures, multi-agent orchestration, and context engineering, with 15 years of experience building production AI systems across financial services, consulting, government, and tech. Connect with Kwasi on LinkedIn →

Bootcamps

Bootcamps

Case Studies

Bootcamps

Courses

Case Studies

Reviews

Consulting

Case studies

Community

Company

Master Production-Ready AI Agents: Evaluate & Ship With Confidence

Evaluating & Shipping Production-Ready AI Agents

About This Webinar

What You Will Learn?

Why Agent Evaluation Matters?

Who Should Attend?

Featured Speakers

Kwasi Ankomah

Sign up to get the latest on events and webinars