Agents that evaluate agents are agent systems used to inspect, score, or debug the behavior of other agent systems. Unlike a single LLM-as-a-judge call, an agent evaluator can retrieve source material, inspect traces, verify tool calls, run code, compare outputs, and reason across a full trajectory.
This pattern is useful for complex workflows where the final answer is not enough. A coding agent might produce passing code but ignore repository conventions. A support agent might answer correctly but call the wrong internal API. Agent evaluators can look at the path, not just the destination.