Server Evals - Phoenix

How It Works

Attach evaluators to a dataset — Open a dataset, navigate to the Evaluators tab, and add LLM-based or built-in code evaluators. Configure input mappings once to tell each evaluator where to find its inputs.

Run an experiment — Execute an experiment against that dataset from the Playground. Attached evaluators run server-side automatically.

Review scores and traces — Results appear as annotations on the experiment run. Every evaluator execution is traced in its own project so you can navigate from a score to the exact LLM call that produced it.

Evaluator Types

LLM Evaluators

LLM-as-a-judge evaluators backed by Phoenix-managed prompts. Use pre-built templates for common tasks like correctness and tool response handling, or write your own.

Code Evaluators

Custom Python or TypeScript evaluators that run in a managed sandbox. Use them when you need deterministic logic that the pre-built evaluators don’t cover.

Pre-built Code Evaluators

Deterministic evaluators that run without an LLM — Contains, Exact Match, Regex, Levenshtein Distance, and JSON Distance.

Why Use Server Evals

Attach once, evaluate everywhere — Evaluators are defined on the dataset, not the experiment. Every Playground run against that dataset automatically records scores.

No local setup required — Built-in evaluators run entirely server-side. LLM evaluators use the model configuration already set up on your Phoenix instance — no SDK, API keys, or local dependencies needed.

Flexible input mapping — Map evaluator variables to any dataset field — input, output, reference, or metadata — using JSON paths for nested values.

Full traceability — Every evaluator execution is traced in its own project. Navigate from an annotation score to the exact LLM call that produced it, making it easy to debug and refine evaluation criteria.

​How It Works

​Evaluator Types

LLM Evaluators

Code Evaluators

Pre-built Code Evaluators

​Why Use Server Evals

​Getting Started

How It Works

Evaluator Types

Why Use Server Evals

Getting Started