Glossary of AI Terminology

What Is Agent-Run Evaluation?

Agent-run evaluation

Agent-run evaluation is the practice of having agents execute evaluation workflows. The agent may select test cases, run an eval suite, inspect failures, summarize regressions, or propose fixes based on evaluator output.

The key distinction is control. In a normal automated eval, a pipeline runs a predefined set of tests. In agent-run evaluation, the agent has some decision-making ability inside the workflow. That makes policy, permissions, audit logs, and reproducibility more important.

Bi-weekly AI Research Paper Readings

Stay on top of emerging trends and frameworks.