View trace and span level results
Evaluation results attach directly to your spans. Open any trace in the Tracing view and use the evaluation panel on each span to inspect labels, scores, and explanations. Results also appear at trace and session scope where configured.

View results on experiments
For evals you run on a dataset experiment, open the experiment table on the dataset and use View Eval Trace from a row to open the evaluated trace. Use Compare Experiments for side-by-side examples, eval labels, and judge explanations. The Playground also shows per-row annotation labels, model output, aggregate average Human v AI alignment score, and per-row Human v AI alignment tags. See Run offline evals on experiments for screenshots and full UI detail.

Configure dashboards
Aggregate eval trends alongside latency, errors, and usage in Dashboards. Add widgets that query evaluation labels and scores to monitor quality over time. Custom SQL metrics can also incorporate evaluation columns. See Custom metric examples.
Debug
Logs
The Task Logs page shows your task configuration, including which evaluators and datasource are attached, alongside a run history with timestamp, status, and trigger for each run. From any row you can view the evals or jump directly to the trace. To get there: open the Evaluators page, select the Running Tasks tab, and open any task.
View eval traces
From the task logs, click View Trace on a run to jump directly to the spans evaluated in that run with the same date range and filters applied. If the task used sampling below 100% or span caps, not every span will have evaluation results attached. On a dataset Experiments tab, open the row menu on a run and choose View Eval Trace for the evaluated trace, or View Logs for that experiment’s run output.
Track evaluation cost
Evaluations, especially LLM-as-a-judge runs at scale, consume tokens and model spend. Use Arize cost tracking and project metrics to reason about evaluation cost alongside application LLM cost.Note: Playground runs display evaluation cost inline. For production tasks, configure cost tracking as described below.
Configure cost tracking
Arize AX tracks model usage from traces using token fields and your pricing configuration. Set this up before you rely on cost dashboards; cost is not retroactive. See Cost tracking for how lookup works, supported token types, and how to configure default or custom pricing.Estimate evaluation cost with custom metrics
If you need a ballpark for judge spend, you can combine trace token counts with an estimate of your evaluation prompt size and judge output length. The custom metric examples page includes an Evaluation Cost Estimate SQL pattern you can adapt to your templates.Reduce evaluation spend
- Lower the sampling rate on online evaluation tasks.
- Prefer code evaluators for objective checks when they are sufficient. See Create evaluators.
- Reuse a single well-versioned judge in the Evaluator Hub instead of duplicating prompts across tasks.