Using Evals with Phoenix

Evals + Traces

Running evals on traces and logging them to Phoenix. Logging evals as annotations in Phoenix.

Evals + Experiments

All arize-phoenix-evals Evaluators are drop-in compatible with experiments.

Evals + Prompt Management (Python)

If your evaluation prompt is versioned in Phoenix Prompt Management, you can fetch it with phoenix-client and convert it into an eval-ready PromptTemplate.

from phoenix.client import Client
from phoenix.evals import (
    ClassificationEvaluator,
    LLM,
    phoenix_prompt_to_prompt_template,
)

client = Client(base_url="http://localhost:6006")
prompt_version = client.prompts.get(prompt_identifier="test-prompt")

prompt_template = phoenix_prompt_to_prompt_template(prompt_version)

evaluator = ClassificationEvaluator(
    name="response_quality",
    llm=LLM(provider="anthropic", model="claude-sonnet-4-6"),
    prompt_template=prompt_template,
    choices={"good": 1.0, "bad": 0.0},
)

This keeps your eval logic aligned with prompt versions managed in Phoenix while still using the standard arize-phoenix-evals API.

Batch Evaluations Overview

⌘I

Get Started

Tracing

Evaluation

Datasets & Experiments

Prompts

Settings

Concepts

Resources

Using Evals with Phoenix

Evals + Traces

Evals + Experiments

Evals + Prompt Management (Python)

​Evals + Traces

​Evals + Experiments

​Evals + Prompt Management (Python)

Evals + Traces

Evals + Experiments

Evals + Prompt Management (Python)