Agents that evaluate agents

Agents that evaluate agents are agent systems used to inspect, score, or debug the behavior of other agent systems. Unlike a single LLM-as-a-judge call, an agent evaluator can retrieve source material, inspect traces, verify tool calls, run code, compare outputs, and reason across a full trajectory.

This pattern is useful for complex workflows where the final answer is not enough. A coding agent might produce passing code but ignore repository conventions. A support agent might answer correctly but call the wrong internal API. Agent evaluators can look at the path, not just the destination.

Docs

Learn

Insights

Company

Docs

Learn

Insights

Company

What Are Agents That Evaluate Agents?