For a hands-on learning experience to develop LLM applications, join our LLM Bootcamp today.
First 5 seats get a 30% discount! So hurry up!

Part 3: Can Agents Evaluate Themselves? | Evaluating AI Agents with Arize AI

Agenda

Master Advanced AI Agent Evaluations and Agent-as-Judge Methods

AI agent evaluation is evolving — it’s no longer just about what the AI agent outputs, but how it got there. In this Part 3 webinar of the community series with Arize AI, we will dive into advanced AI agent evaluation techniques, including path-based reasoning, convergence analysis, and even using agents to evaluate other agents.

Explore how to measure the efficiency and structure of agent reasoning paths, assess collaboration in multi-agent systems, and evaluate the quality of planning in complex setups like hierarchical or crew-based frameworks. You will also get a look at emerging techniques like self-evaluation, peer review, and agent-as-judge models — where agents critique and improve each other in real time.

What We Will Cover:

  • Understand how to evaluate not just what an AI agent does, but how it arrived at its output.
  • Measure convergence and reasoning paths to assess execution quality and efficiency.
  • Learn how to evaluate collaboration and role effectiveness in multi-agent systems.
  • Explore methods for assessing planning quality in hierarchical and crew-based agents.
  • Dive into agents-as-judges: Enable self-evaluation and peer review mechanisms and build critique tools and internal feedback loops to improve agent performance.
  • Discuss real-world applications of these techniques in large-scale, agentic AI systems.
  • Interactive Element: Watch a live example of an agent acting as a judge — or participate in a multi-agent AI agent evaluation demo using Arize Phoenix.

 

Missed the earlier parts? Catch up on Part 1 and Part 2 of the series!

John Gilhuly

Head of Developer Relations at Arize AI

John is the Head of Developer Relations at Arize AI, focused on open-source LLM observability and evaluation tooling. He holds an MBA from Stanford, where he specialized in the ethical, social, and business implications of AI development, and a B.S. in C.S. from Duke. Prior to joining Arize, John led GTM activities at Slingshot AI, and served as a venture fellow at Omega Venture Partners. In his pre-AI life, John built out and ran technical go-to-market teams at Branch Metrics.

RSVP