Skip to main content

Documentation Index

Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

Your automated evals say a response is “grounded” — but is it really? Sometimes you need a human to weigh in. Annotations let your team add ground-truth labels and scores directly on spans, building a feedback loop between humans and your AI.

How to do it

  1. Open a trace and click into any span
  2. Click the Annotate toggle in the span toolbar
  3. Select an annotation config (e.g., “Correctness”, “Helpfulness”) or create a new one
  4. Add your label or score — saves automatically
[screenshot: annotation panel open next to a span with label selector]

Annotation configs

Configs define the schema for your labels. Shared across the project so everyone uses the same schema.
  • Categorical — fixed labels (e.g., “correct”, “incorrect”, “partially correct”)
  • Continuous — numeric scores on a range (e.g., 1–5)
Create new configs on the fly from the annotation panel: click + New Config, choose type, add options, save.

Annotation notes

In addition to labels and scores, you can attach free-text notes to any annotation. Notes are useful for explaining edge cases, providing context for disagreements, or flagging spans for follow-up discussion.

Measure eval quality with annotations

Use annotations as ground truth to measure how well your automated evals perform:
SELECT
    PRECISION(
        predicted = "eval.Groundedness.label",
        actual = "annotation.Human Groundedness.label",
        pos_class = 'grounded'
    )
FROM model

Annotations vs. evals

AnnotationsEvals
WhoHumansAutomated (LLM-as-judge or code)
ScaleSmall samplesEvery span
Best forGround truth, calibrationProduction monitoring
Use both: evals for scale, annotations for accuracy.