Agent-to-agent evaluation is the use of one agent to evaluate another agent's behavior. The evaluator agent may inspect traces, run tools, retrieve references, compare outputs, and produce structured feedback.
This pattern extends LLM-as-a-judge from single-output scoring to workflow-aware evaluation. It is promising for complex tasks, but it needs calibration, auditability, and human oversight so the evaluator does not become an unchecked source of errors.