@arizeai/phoenix-client/vitest and
@arizeai/phoenix-client/jest submodules record scored, labeled, or
explained results on an experiment run. They map directly to Phoenix’s
experiment_evaluations REST surface — each annotation becomes one
ExperimentEvaluation with an ExperimentEvaluationResult body.
The Annotation Shape
ExperimentEvaluationResult plus the name and annotator_kind carried
on the surrounding evaluation body.
logAnnotation(annotation)
Records a single annotation against the current run. Must be called
inside a test() body.
evaluate(evaluator, params?)
Runs an evaluator object and records its result as an annotation on the
current run. An evaluator is any object with a name and an evaluate
function, including evaluators created with
@arizeai/phoenix-evals.createEvaluator() and
@arizeai/phoenix-client/experiments.asExperimentEvaluator().
The evaluator call is traced as an OpenInference EVALUATOR span, and the
annotation is linked back to that evaluator trace.
If params is omitted, Phoenix supplies the current test’s input,
recorded output, expected, metadata, and task traceId. If params
is supplied, it is merged on top of those defaults.
evaluator.name. The evaluator result can be
a number, boolean, string label, null, or an object with score, label,
explanation, and metadata.
Using @arizeai/phoenix-evals
createEvaluator() gives you a reusable evaluator object. px.evaluate()
runs that evaluator in the test context, traces the call, and records its
result on the experiment run. If the evaluator itself uses OpenInference
telemetry, those implementation spans appear under the evaluator trace:
createEvaluator() or OpenInference decorators from
@arizeai/phoenix-otel when the evaluator implementation should emit child
spans of its own. The older traceEvaluator(fn) helper remains available for
raw function wrapping, but evaluator objects are the preferred interface.
Built-In pass Annotation
Every test automatically records a pass boolean annotation based on
whether the test body threw. You don’t have to log it yourself; it’s
included in the reporter summary and on the run in Phoenix.
Aggregating Annotations In CI
SuiteacceptanceCriteria aggregate annotation scores after all cases run.
Use them when a metric should clear a threshold across the dataset — either an
average bar (e.g. mean correctness >= 0.8) or a passRate rule requiring a
minimum fraction of runs to satisfy a per-run passFn predicate (e.g. 100% of
runs must have valid_sql === true). See
CI Eval Tests: Vitest for the full
configuration shape.
Source Map
src/testing/helpers.tssrc/testing/acceptance.tssrc/testing/phoenix-test-tracking.tssrc/testing/types.ts

