Available in arize-phoenix 16.0.0+ Write your own evaluation logic in the Phoenix UI and run it server-side on experiment results. Author a Python or TypeScriptDocumentation Index
Fetch the complete documentation index at: https://arizeai-433a7140.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
evaluate() function that returns a label, score, and explanation, attach it to a dataset, and Phoenix runs it in an isolated sandbox on every experiment run.
Writing a code evaluator
Open a dataset, go to the Evaluators tab, and click Add evaluator → Code evaluator. Pick a language, writeevaluate(), map dataset fields to its parameters, and click Test to dry-run against a real example before saving.
- Field mapping — bind
output,reference,input, andmetadatato dataset columns or literal values - Versioned — every save creates a new version, so historical runs always link back to the exact code that produced each score
- Traced — each evaluator execution appears as a span, so you can debug it like any other LLM call
Sandboxes
Code evaluators run in isolated sandboxes, configured by admins under Settings → Sandboxes:- Local (no credentials) — WebAssembly for Python, Deno for TypeScript. Ship with Phoenix and are suitable for self-contained, deterministic checks.
- Hosted (credentials required) — E2B, Daytona, Vercel, and Modal. Support environment variables, outbound network access, and third-party packages.
PHOENIX_ALLOWED_SANDBOX_PROVIDERS to a comma-separated list of WASM, DENO, E2B, DAYTONA, VERCEL, MODAL, or NONE to disable all. When unset, all providers are available.

