What is the Evaluator Hub?

The Evaluator Hub is your centralized place for managing evaluators. Instead of rebuilding evaluation logic every time you start a new task, you can define an evaluator once in the Hub and reuse it across projects, datasets, and workflows.

Why Use the Eval Hub?

Without a centralized evaluator library, teams tend to recreate the same evaluation logic across tasks, lose track of what changed, and end up with inconsistent quality criteria across projects. The Eval Hub solves this by giving you a single source of truth for all your evaluators.

Reusable across tasks and projects. Create an evaluator once and attach it to any evaluation task — online monitoring, offline batch runs, or dataset experiments. No need to rewrite prompts, reconfigure models, or duplicate code logic.
Full version history. Every change to an evaluator is tracked with a commit message. You can see what changed, when, and why — making it easy to audit evaluation criteria over time.
Consistent quality standards. When the same evaluator is used across projects, your team applies the same definition of “good” everywhere. This eliminates drift between how different tasks measure performance.
Flexible column mappings. Template variables (for LLM-as-a-Judge) and data variables (for code evaluators) map to your datasource columns at the task level, so a single evaluator works across datasets and projects with different schemas.

Navigating the Evaluators Page

The Evaluators page in Arize AX contains two tabs:

Evaluator Hub Tab

This is where evaluators are defined, configured, and managed. Each evaluator card shows its name, type, model configuration, version count, and update history. From here you can:

Browse all available evaluators
Create new evaluators
Edit and version existing evaluators
Launch a new task directly from an evaluator

Running Tasks Tab

This is where evaluation tasks execute evaluators against your data. A task connects an evaluator to a data source (project traces or dataset) and runs it on a schedule or as a one-time batch. See Online Evals for more on creating and managing tasks.

Creating an Evaluator in the Eval Hub

Navigate to Evaluators in the left sidebar, then click New Evaluator in the upper right.

The Eval Hub currently supports LLM-as-a-Judge evaluators. Reusable Code evaluators are coming soon.

LLM-as-a-Judge Evaluators

LLM-as-a-Judge evaluators use an LLM to assess outputs based on a structured prompt. There are three ways to create one:

Option A: Use a Pre-Built Template

Arize provides pre-built evaluation templates tested against benchmarked datasets. These cover common evaluation scenarios so you can get started quickly.

Click New Evaluator
Select a template from the list (e.g., Hallucination, Relevance, Toxicity, User Frustration)
Give your evaluator a name — this is how it appears in the Hub and in your results
Configure the LLM settings: select a provider, model, and parameters
Click Save to add it to the Eval Hub

Available pre-built templates include:

Template	What it measures
Hallucination	Outputs containing information not supported by the reference
Relevance	Whether responses address the input question
Toxicity	Harmful or inappropriate content
Helpfulness	How useful the response is to the user
Q&A Correctness	Answer accuracy given reference documents
Summarization	Whether summaries capture the source material
User Frustration	Signs of frustration in conversations
Code Generation	Code correctness and readability
SQL Generation	SQL query correctness
Tool Calling	Function call accuracy and parameter extraction

Option B: Create from Blank

Build a custom evaluator when pre-built templates don’t capture your application-specific criteria.

Click New Evaluator, then select Create From Blank
Name the evaluator descriptively (e.g., “Travel Plan Completeness”, “Regulatory Compliance Check”)
Write your prompt template — describe the judge’s role, evaluation criteria, and include template variables (e.g., {input}, {output}, {context}) that will be populated with your data
Define output labels — set the possible values the judge can return (e.g., correct/incorrect, or a 1–5 scale) along with their scores
Configure the judge model — select the AI provider, model, and parameters
Toggle Explanations to “On” if you want the judge to provide a rationale for each label
Click Save

Categorical labels (e.g., correct/incorrect) tend to be more reliable and consistent than numeric scores for most evaluation tasks.

Option C: Use Alyx to Generate an Evaluator

Alyx can generate custom evaluators from plain language descriptions.

Click the Alyx icon in the upper right corner

Describe what you want to evaluate in plain language, for example:

Write a custom evaluation that checks if customer support responses
are empathetic, address the customer's concern, and provide actionable
next steps. Score from 1-5.

Review the generated evaluator — adjust the template, labels, or model as needed
Save to the Eval Hub

Code Evaluators (⚠️ Coming Soon)

Code evaluators use deterministic logic — Python code — to score outputs. They’re ideal for objective checks like regex matching, JSON validation, keyword presence, or any custom heuristic.

Reusable Code Evaluators will be available in the Evaluator Hub soon. In the meantime, you can add code evaluators directly to your task.

Versioning Evaluators

As your understanding of “good” evolves, your evaluators should too. The Eval Hub tracks every change to an evaluator with a version history.

How Versioning Works

Each time you edit and save an evaluator, a new version is created
You’re prompted to add a commit message describing what changed (e.g., “Tightened criteria for budget accuracy”, “Added edge case for multi-city trips”)
The full version history is visible on the evaluator detail page

Best Practices for Versioning

Write descriptive commit messages. Future you (and your teammates) will thank you when reviewing why evaluation criteria shifted.
Version after testing. Use the Evaluator Playground to test changes before committing a new version.
Review version history before modifying. Check what the evaluator currently does and why recent changes were made before introducing new edits.

Reusing an Evaluator Across Tasks

The core value of the Eval Hub is reuse. Once an evaluator is saved, you can attach it to any evaluation task without recreating it.

Attaching an Evaluator to a Task

There are two ways to use an existing evaluator: From the task creation flow:

Click New Task on the Evaluators page
Select the evaluator type (LLM-as-a-Judge or Code Evaluator)
Click Add Evaluator, then choose your evaluator from the Eval Hub

From the Eval Hub directly:

Navigate to the Eval Hub tab
Find the evaluator you want to use
Click Use Evaluator — this opens the task creation flow with that evaluator pre-selected

Configuring Column Mappings

When you attach an evaluator to a task, you may need to map its variables to your datasource columns. This is what makes evaluators truly portable — the same evaluator can work with different data schemas.

After adding the evaluator to a task, the column mappings panel shows all variables — prompt template variables for LLM-as-a-Judge evaluators, or data variables for code evaluators
For each variable (e.g., {input}, {output}, {context}), select the corresponding column from your datasource
If the variable names match your datasource columns, mappings are configured automatically

Column mappings are configured at the task level, not the evaluator level. This means a single evaluator can be mapped differently for different projects or datasets.

Example Workflow: End-to-End

Here’s how a typical workflow looks using the Eval Hub:

Create an Evaluator in the Hub

Navigate to Evaluators > New Evaluator. Choose an LLM-as-a-Judge evaluator (pre-built template, custom, or Alyx-generated) or a code evaluator. Configure the settings, define your labels, and save it.

Test it in the Playground

Before running at scale, test your evaluator against sample data. For LLM-as-a-Judge evaluators, use the Playground to refine the prompt template until you’re confident in the results.

Attach it to a Task

Create a task that runs the evaluator continuously on existing or production traces. Configure filters, sampling rate, and column mappings. See Setting Up Online Evals for the full setup guide.

Reuse it on a different project

When you start a new project with similar quality criteria, go back to the Eval Hub and attach the same evaluator to a new task — just update the column mappings to match the new data schema.

Version it as criteria evolve

As you learn more about what “good” looks like for your application, edit the evaluator and commit a new version with a descriptive message. All tasks using this evaluator will pick up the latest version.

Alyx

Observe

Evaluate

Develop

Prompts

Machine Learning

Security & Settings

Evaluator Hub

What is the Evaluator Hub?

Why Use the Eval Hub?

Navigating the Evaluators Page

Evaluator Hub Tab

Running Tasks Tab

Creating an Evaluator in the Eval Hub

LLM-as-a-Judge Evaluators

Option A: Use a Pre-Built Template

Option B: Create from Blank

Option C: Use Alyx to Generate an Evaluator

Code Evaluators (⚠️ Coming Soon)

Versioning Evaluators

How Versioning Works

Best Practices for Versioning

Reusing an Evaluator Across Tasks

Attaching an Evaluator to a Task

Configuring Column Mappings

Example Workflow: End-to-End

Alyx

Observe

Evaluate

Develop

Prompts

Machine Learning

Security & Settings

​What is the Evaluator Hub?

​Why Use the Eval Hub?

​Navigating the Evaluators Page

​Evaluator Hub Tab

​Running Tasks Tab

​Creating an Evaluator in the Eval Hub

​LLM-as-a-Judge Evaluators

​Option A: Use a Pre-Built Template

​Option B: Create from Blank

​Option C: Use Alyx to Generate an Evaluator

​Code Evaluators (⚠️ Coming Soon)

​Versioning Evaluators

​How Versioning Works

​Best Practices for Versioning

​Reusing an Evaluator Across Tasks

​Attaching an Evaluator to a Task

​Configuring Column Mappings

​Example Workflow: End-to-End

What is the Evaluator Hub?

Why Use the Eval Hub?

Navigating the Evaluators Page

Evaluator Hub Tab

Running Tasks Tab

Creating an Evaluator in the Eval Hub

LLM-as-a-Judge Evaluators

Option A: Use a Pre-Built Template

Option B: Create from Blank

Option C: Use Alyx to Generate an Evaluator

Code Evaluators (⚠️ Coming Soon)

Versioning Evaluators

How Versioning Works

Best Practices for Versioning

Reusing an Evaluator Across Tasks

Attaching an Evaluator to a Task

Configuring Column Mappings

Example Workflow: End-to-End