An evaluation rubric is a structured set of criteria used to score AI outputs or agent behavior. It tells a human reviewer, LLM judge, or scoring function what good and bad look like.
Rubrics are where subjective quality becomes operational. A good rubric defines pass/fail conditions, severity levels, examples, edge cases, and what evidence the evaluator should consider. Weak rubrics create noisy evals that teams stop trusting.