Glossary of AI Terminology

What Is LLM Evaluation?

LLM evaluation

LLM evaluation is the practice of measuring whether a large language model or LLM-powered application behaves as intended. It can score correctness, relevance, faithfulness, safety, tone, tool use, latency, cost, and task success.

For production systems, LLM evaluation should measure the application, not just the base model. Users experience prompts, retrieval, tools, memory, orchestration, and policies together. The eval should match that system boundary.

Bi-weekly AI Research Paper Readings

Stay on top of emerging trends and frameworks.

View Research Papers

Docs

Learn

Insights

Company

Docs

Learn

Insights

Company

What Is LLM Evaluation?

LLM evaluation

Bi-weekly AI Research Paper Readings

Docs

Learn

Insights

Company

What Is LLM Evaluation?

LLM evaluation

Bi-weekly AI Research Paper Readings

Subscribe to The Evaluator