> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# View eval results and costs

> Understand where to find evaluation results, how to monitor quality over time, and how to track and reduce evaluation spend.

## View trace and span level results

Evaluation results attach directly to your spans. Open any trace in the Tracing view and use the evaluation panel on each span to inspect labels, scores, and explanations. Results also appear at trace and session scope where configured.

<Frame caption="Evals appear on the traces table">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/evaluate/eval%20metrics%202.png" alt="Playground Traces with summary charts for traffic, span latency, tokens and cost, and a custom eval metric, plus a traces table showing Span Evaluations tags per row alongside latency and token columns" />
</Frame>

<Frame caption="Span evaluations">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/evaluate/span%20eval.png" alt="Trace detail with span tree and Evaluations tab open for a ChatCompletion span, showing span-scoped eval rows with name, label, score, and explanation for qa and hallucination" />
</Frame>

## View results on experiments

For evals you run on a dataset experiment, open the experiment table on the dataset and use View Eval Trace from a row to open the evaluated trace. Use Compare Experiments for side-by-side examples, eval labels, and judge explanations. The Playground also shows per-row annotation labels, model output, aggregate average Human v AI alignment score, and per-row Human v AI alignment tags. See [Run offline evals on experiments](/ax/evaluate/run-evals-on-experiments) for screenshots and full UI detail.

<Frame caption="View evals on playground experiments">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/evaluate/aligned%20evals.png" alt="Playground with a tone evaluator prompt and results table showing annotation labels, model output, aggregate Avg Human v AI align score, and per-row Human v AI align tags for aligned or not aligned" />
</Frame>

<Frame caption="Compare Experiments">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/evaluate/compare%20exp%20eval%20result.png" alt="Compare Experiments table tab for a dataset showing example rows, experiment output column, Evals tags for correct and incorrect, and a popover with score, label, and explanation for an evaluator" />
</Frame>

## Configure dashboards

Aggregate eval trends alongside latency, errors, and usage in [Dashboards](/ax/observe/dashboards). Add widgets that query evaluation labels and scores to monitor quality over time. Custom SQL metrics can also incorporate evaluation columns. See [Custom metric examples](/ax/observe/projects/custom-metrics-api/custom-metric-examples).

<Frame caption="Eval Result widget">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/evaluate/eval%20dash.png" alt="Dashboard widget editor for an Eval Result bar chart counting eval labels from a tracing project, with Data settings for project, eval attribute, and filters" />
</Frame>

## Debug

### Logs

The Task Logs page shows your task configuration, including which evaluators and datasource are attached, alongside a run history with timestamp, status, and trigger for each run. From any row you can view the evals or jump directly to the trace.

To get there: open the Evaluators page, select the Running Tasks tab, and open any task.

<Frame caption="Task logs">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/evaluate/view%20task%20logs.png" alt="Evaluators Running Eval Tasks with Task Logs side panel showing evaluators and datasource, run history chart, and per-run status with View Evals and View Trace actions" />
</Frame>

### View eval traces

From the task logs, click View Trace on a run to jump directly to the spans evaluated in that run with the same date range and filters applied. If the task used sampling below 100% or span caps, not every span will have evaluation results attached.

On a dataset Experiments tab, open the row menu on a run and choose View Eval Trace for the evaluated trace, or View Logs for that experiment's run output.

<Frame caption="View Eval Trace from an experiment">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/evaluate/experiments%20-%3E%20eval%20logs.png" alt="Dataset Experiments tab with summary metrics and experiment rows, row menu open showing View Eval Trace and View Logs" />
</Frame>

<h2 id="track-evaluation-cost">
  Track evaluation cost
</h2>

Evaluations, especially LLM-as-a-judge runs at scale, consume tokens and model spend. Use Arize AX cost tracking and project metrics to reason about evaluation cost alongside application LLM cost.

<Info>
  Note: Playground runs display evaluation cost inline. For production tasks, configure cost tracking as described below.
</Info>

<Frame caption="Evaluation cost in Playgrounds">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/evaluate/cost%20tracking.png" alt="Playground experiment results table with average cost per row and a Total Cost popover showing aggregate output spend for the eval run" />
</Frame>

### Configure cost tracking

Arize AX tracks model usage from traces using token fields and your pricing configuration. Set this up before you rely on cost dashboards; cost is not retroactive. See [Cost tracking](/ax/security-and-settings/cost-tracking) for how lookup works, supported token types, and how to configure default or custom pricing.

### Estimate evaluation cost with custom metrics

If you need a ballpark for judge spend, you can combine trace token counts with an estimate of your evaluation prompt size and judge output length. The [custom metric examples](/ax/observe/projects/custom-metrics-api/custom-metric-examples) page includes an Evaluation Cost Estimate SQL pattern you can adapt to your templates.

### Reduce evaluation spend

* Lower the sampling rate on [online evaluation tasks](/ax/evaluate/run-evals-on-traces).
* Prefer code evaluators for objective checks when they are sufficient. See [Create evaluators](/ax/evaluate/create-evaluators).
* Reuse a single well-versioned judge in the Evaluator Hub instead of duplicating prompts across tasks.
