Glossary of AI Terminology

What Are Evaluation Metrics?

Evaluation metrics

Evaluation metrics are the numerical or categorical measures used to judge AI system behavior. Examples include task success rate, correctness, faithfulness, relevance, toxicity, latency, cost, tool-call accuracy, Recall@K, and human preference.

The best metric depends on the job the system is supposed to do. A retrieval metric will not tell you whether an answer is safe. A correctness score will not tell you whether the agent took an expensive or risky path. Production eval suites usually need multiple metrics.

Bi-weekly AI Research Paper Readings

Stay on top of emerging trends and frameworks.