Evaluators
Definition and types of Evaluators. Score abstraction.
At the core, an Evaluator is anything that returns a Score. Evaluators can be split into two broad categories:
LLM-based: evaluators that use an LLM to perform the judgement.
Examples: hallucination, document relevance
Heuristic: evaluators that use a deterministic process or calculation.
Examples: exact match, BLEU, precision
Scores
Every score has the following properties:
name: The human-readable name of the score/evaluator.
source: The origin of the evaluation signal (llm, heuristic, or human)
direction: The optimization direction; whether a high score is better or worse
Scores may also have some of the following properties:
score: numeric score
label: The categorical outcome (e.g., "good", "bad", or other label).
explanation: A brief rationale or justification for the result.
metadata: Arbitrary extra context such as model details, intermediate scores, or run info.
Properties of Evaluators
All phoenix-evals Evaluators
have the following properties:
Sync and async evaluate methods for evaluating a single record or example
Single record evals return a list of
Score
objects. Oftentimes, this is a list of length 1 (e.g.exact_match
), but some evaluators return multiple scores (e.g. precision-recall).A discoverable
input_schema
that describes what inputs it requires to run.Evaluators accept an arbitrary
eval_input
payload, and an optionalinput_mapping
which map/transforms the input to the shape they require.
Last updated
Was this helpful?