This page documents every return shape a code evaluator can produce and how Phoenix maps each one to anDocumentation Index
Fetch the complete documentation index at: https://arizeai-433a7140.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
EvaluationResult with label, score, and explanation fields.
This page covers the server-side code evaluators that run in the Phoenix UI (Sandbox evaluators). For client-side
create_evaluator / createEvaluator SDK evaluators, see Code Evaluators.The Triple-Collapse Model
Every return value from a code evaluator is normalized to a triple:(label, score, explanation). Phoenix applies this in two stages:
- Stage 1 — Extract: The raw return value is mapped to a
Triplebased on its shape (bare scalar or dict-by-key). - Stage 2 — Validate: The triple is checked against the evaluator’s output config (categorical, continuous, or none).
ValueError whose message enumerates all accepted shapes for the configured output type.
Accepted Shapes by Output Config
Categorical Output Config
A categorical config defines a fixed set of{label, score} pairs. The evaluator must return one of the configured labels; Phoenix looks up the associated score automatically.
Bare string (recommended):
- Python
- TypeScript
- Python
- TypeScript
- The label must exactly match one of the configured values; unrecognized labels raise
ValueError. - Including a
scorekey in the dict that conflicts with the config’s lookup value raisesValueError. - Free-form
explanationstrings are always accepted and passed through toEvaluationResult.explanation. - Tuple shorthand (
return ("pass", 1.0)) is not accepted; use the dict form if you need to supply additional fields.
Continuous Output Config
A continuous config validates that the returned value is a finite number within optionallower_bound / upper_bound bounds. Labels are optional and free-form.
Bare number (recommended):
- Python
- TypeScript
- Python
- TypeScript
boolvalues are not treated as numeric and raiseValueError.NaNandInfinityare rejected.- Free-form string labels are allowed in the dict form alongside a numeric score.
- Tuple shorthand is not accepted.
No Output Config
When no output config is specified, Phoenix applies a permissive bare passthrough:| Return value | Result |
|---|---|
str | label=<value> |
int or float | score=<value> |
bool | label="True" or label="False" (not numeric) |
None | (label=None, score=None) |
{"label": ..., "score": ..., "explanation": ...} | triple by key |
The explanation Field
Any accepted shape may include an explanation string. Phoenix passes it through to EvaluationResult.explanation unchanged:
- Python
- TypeScript
Multi-Output Evaluators
When an evaluator has multiple output configs (e.g., one for toxicity and one for safety), Phoenix supports two routing modes:Shared value (default)
Return a single value — Phoenix applies the same return value to each output config independently:- Python
- TypeScript
Per-config routing dict
Return a dict whose keys match every output config name. Phoenix routes each value to the corresponding config:- Python
- TypeScript
- The dict must contain a key for every output config name; a partial match is treated as a shared value, not a routing dict.
- A top-level
"explanation"key acts as a shared fallback: if a per-config sub-value omits explanation, the top-level value fills it in. - Per-config sub-values may themselves be dicts with their own
"explanation"key — per-config explanation takes precedence over the shared fallback.
- Python
- TypeScript
Multi-output naming convention
Each output config produces a separateEvaluationResult named {evaluator_name}.{config_name}. For example, an evaluator named content-check with configs toxicity and safety produces two results: content-check.toxicity and content-check.safety.
Error Messages
When a return value does not match the accepted shapes, theValueError message enumerates all valid shapes for the configured output type in the evaluator’s language. For example, a categorical config with values ["pass", "fail"] in Python would produce:

