Skip to main content

Overview

The exact_match evaluator is a simple code-based evaluator that checks if the output exactly equals the expected value. It performs a strict string comparison with no normalization.
This evaluator is only available as a built-in for Python. For TypeScript, see the usage example below showing how to create an equivalent evaluator using createEvaluator.

When to Use

Use the exact_match evaluator when you need to:
  • Validate exact outputs - Check that responses match expected values character-for-character
  • Evaluate classification tasks - Verify categorical outputs match expected labels
  • Test deterministic outputs - Validate outputs that should be identical every time
  • Quick sanity checks - Fast evaluation without LLM costs
This is a code-based evaluator that performs direct string comparison. For semantic similarity or fuzzy matching, consider using an LLM-based evaluator instead.

Supported Levels

LevelSupportedNotes
SpanYesEvaluate any span output against expected values.

Input Requirements

The exact_match evaluator requires two inputs:
FieldTypeDescription
outputstringThe actual output to evaluate
expectedstringThe expected value to match against

Important Notes

  • No normalization: The comparison is case-sensitive and whitespace-sensitive
  • String comparison: Both inputs are compared as strings
  • No partial matching: The entire string must match exactly

Output Interpretation

The evaluator returns a Score object with the following properties:
PropertyValueDescription
labelTrue or FalseWhether the strings match
score1.0 or 0.0Numeric score (1.0 = match, 0.0 = no match)
kind"code"Indicates this is a code-based evaluator
direction"maximize"Higher scores are better

Usage Examples

from phoenix.evals.metrics import exact_match

# Basic usage with matching field names
eval_input = {
    "output": "Paris",
    "expected": "Paris"
}
scores = exact_match.evaluate(eval_input)
print(scores[0])
# Score(name='exact_match', score=1.0, label=True, kind='code', ...)

# Non-matching example
eval_input = {
    "output": "paris",  # lowercase
    "expected": "Paris"  # uppercase
}
scores = exact_match.evaluate(eval_input)
print(scores[0].score)  # 0.0 (case-sensitive comparison)

Implementing Case-Insensitive Matching

If you need case-insensitive matching, normalize your inputs first:
from phoenix.evals.metrics import exact_match

eval_input = {
    "output": "PARIS".lower(),
    "expected": "paris"
}
scores = exact_match.evaluate(eval_input)
print(scores[0].score)  # 1.0
Or create a custom evaluator with normalization:
from phoenix.evals.evaluators import Score, create_evaluator

@create_evaluator(name="exact_match_normalized", kind="code")
def exact_match_normalized(output: str, expected: str) -> Score:
    """Case-insensitive exact match with whitespace normalization."""
    normalized_output = output.strip().lower()
    normalized_expected = expected.strip().lower()
    correct = normalized_output == normalized_expected
    return Score(score=float(correct))

Using with Phoenix

Evaluating Traces

Run evaluations on traces collected in Phoenix and log results as annotations:

Running Experiments

Use the exact_match evaluator in Phoenix experiments:

API Reference