Annotation Configs
Use human feedback to curate datasets for testing
Human feedback is often the most nuanced form of evaluation, capturing subtleties that automated methods miss. Even a small number of well-curated annotations can drive meaningful improvements.
Annotations are custom labels that can be added to represent this human feedback. They allow teams and subject matter experts to manually label data and curate high-quality datasets. Users can also log feedback directly using labeling queues or our annotations API.
Why are annotations critical?
Annotations enable deep error analysis, which is the first step toward writing meaningful evals and understanding where performance fall short.
A well-annotated dataset is essential for testing and refining eval templates.
Annotations also provide a structured way to capture human feedback that can be fed back into prompt optimization and fine-tuning.
By creating high-quality labeled data, annotations serve as a reliable ground truth.
For more on annotations, see Hamel’s Evals Blog.
What is an Annotation Config?
Annotation Configs allow you to define consistent annotation schemas that can be reused across your workspace, ensuring evaluations are structured and comparable over time.
To create a new annotation config, navigate to Annotation Configs in the sidebar and click New Annotation Config. You’ll then define four key elements:
Annotation Name: Provide a clear, descriptive name for your annotation. This helps others identify its purpose (ex: Correctness or Response Helpfulness).
Annotation Config Type: Choose how you want to capture feedback
Categorical Options – Assign predefined labels (e.g., Correct / Incorrect, Helpful / Unhelpful).
Continuous Score – Apply a numeric score or range to quantify performance (e.g., 0–1 for relevance).
Freeform Text – Enter open-ended feedback for qualitative evaluations.
Optimization Direction: Specify how the annotation is evaluated: Maximize when higher scores are better or Minimize when lower scores are better
Define Labels or Scores: Depending on your selected type, define the label categories or scoring range. For example: Correct (score = 1) and Incorrect (score = 0)
Add Annotations in the UI
Traces
Annotations can be applied at a per-span level for LLM use cases. Within the span, you can click the icon to annotate. From here, you can choose an existing annotation config or create a new one.
Experiments
You can annotate experiment results in Arize to capture human feedback. As you iterate and make system changes, this feedback serves as a strong signal for identifying improvements or regressions.
Create Annotations via API
Annotations can also be performed via our Python SDK using the log_annotations function to attach human feedback.
Note: Annotations can be applied on spans up to 14 days prior to the current day. To apply annotations beyond this lookback window, please reach out to support@arize.com
Logging the annotation
Import Packages and Setup Arize Client
import os
import pandas as pd
from arize.pandas.logger import Client
API_KEY = os.environ.get("ARIZE_API_KEY") # You can get this form the UI
SPACE_ID = os.environ.get("ARIZE_SPACE_ID") # You can get this form the UI
DEVELOPER_KEY = os.environ.get('ARIZE_DEVELOPER_KEY') # Needed for sync functions
PROJECT_NAME = "YOUR_PROJECT_NAME" # Replace with your project name
arize_client = Client(
space_id=SPACE_ID,
api_key=API_KEY,
developer_key=DEVELOPER_KEY
)
response = arize_client.log_annotations(
dataframe=annotations_dataframe,
project_name=PROJECT_NAME,
validate=True, # Keep validation enabled
verbose=True # Enable detailed SDK logs, especially when first trying
)Last updated
Was this helpful?

