Agent-run evaluation is the practice of having agents execute evaluation workflows. The agent may select test cases, run an eval suite, inspect failures, summarize regressions, or propose fixes based on evaluator output.
The key distinction is control. In a normal automated eval, a pipeline runs a predefined set of tests. In agent-run evaluation, the agent has some decision-making ability inside the workflow. That makes policy, permissions, audit logs, and reproducibility more important.