Glossary of AI Terminology

What Is LLM Evaluation?

LLM evaluation

LLM evaluation is the practice of measuring whether a large language model or LLM-powered application behaves as intended. It can score correctness, relevance, faithfulness, safety, tone, tool use, latency, cost, and task success.

For production systems, LLM evaluation should measure the application, not just the base model. Users experience prompts, retrieval, tools, memory, orchestration, and policies together. The eval should match that system boundary.

Bi-weekly AI Research Paper Readings

Stay on top of emerging trends and frameworks.