> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.site/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluators

> Create and manage LLM-as-judge evaluators and their versions programmatically.

<Note>
  The `evaluators` client methods are currently in **BETA**. The API may change without notice. A one-time warning is emitted on first use.
</Note>

Create and manage LLM-as-judge evaluators and their versions programmatically. Evaluators use prompt templates with `{variable}` placeholders that reference span or trace attributes to automatically score your LLM application's outputs.

## Key Capabilities

* Create template-based LLM-as-judge evaluators within a space
* Version evaluators with commit messages (versions are immutable once created)
* Retrieve evaluators with their latest or a specific version
* List, update, and delete evaluators
* List and retrieve individual evaluator versions

## List Evaluators

List all evaluators you have access to, with optional filtering by space.

```python theme={null}
resp = client.evaluators.list(
    space="your-space-name-or-id",  # optional
    name="Relevance",               # optional substring filter
    limit=50,
)

for evaluator in resp.evaluators:
    print(evaluator.id, evaluator.name)
```

For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see [Response Objects](/api-clients/python/version-8/overview#response-objects).

## Create a Template (LLM-as-Judge) Evaluator

Create a new template evaluator with an initial version. Evaluator names must be unique within the target space.

```python theme={null}
from arize.evaluators.types import TemplateConfig, EvaluatorLlmConfig

evaluator = client.evaluators.create_template_evaluator(
    name="Relevance",
    space="your-space-name-or-id",
    commit_message="Initial version",
    description="Scores whether the response is relevant to the query",
    template_config=TemplateConfig(
        name="Relevance",
        template="Is the following response relevant to the query?\nQuery: {input.value}\nResponse: {output.value}",
        include_explanations=True,
        use_function_calling_if_available=True,
        classification_choices={"relevant": 1, "irrelevant": 0},
        direction="maximize",
        llm_config=EvaluatorLlmConfig(
            ai_integration_id="your-ai-integration-id",
            model_name="gpt-4o",
            invocation_parameters={"temperature": 0},
            provider_parameters={},
        ),
    ),
)

print(evaluator.id, evaluator.name)
```

## Create a Code Evaluator

Create a new code evaluator with an initial version. Use `ManagedCodeConfig` for built-in checks (`JSONParseable`, `Regex`, `KeywordMatch`, `ExactMatch`) or `CustomCodeConfig` for user-supplied Python.

```python theme={null}
from arize.evaluators.types import ManagedCodeConfig

evaluator = client.evaluators.create_code_evaluator(
    name="JSON Parseable",
    space="your-space-name-or-id",
    commit_message="Initial version",
    code_config=ManagedCodeConfig(
        type="managed",
        name="json_parseable",
        managed_evaluator="JSONParseable",
        variables=["output"],
    ),
)

print(evaluator.id, evaluator.name)
```

Evaluator `name` must match the regex `^[a-zA-Z0-9_\s\-&()]+$`.

### Template Variables

Template strings use `{variable}` placeholders (f-string format) that reference span or trace attributes (e.g., `{input.value}`, `{output.value}`, `{attributes.my_custom_attr}`).

### Classification vs. Freeform Output

* **Classification** — Provide `classification_choices` as a `dict[str, float]` mapping label → numeric score (e.g., `{"relevant": 1, "irrelevant": 0}`). The evaluator outputs one of these labels along with its score.
* **Freeform** — Omit `classification_choices`. The evaluator produces a numeric score without predefined labels.

## Get an Evaluator

Retrieve an evaluator by name or ID. By default the latest version is returned. When using a name, provide `space` to disambiguate.

```python theme={null}
evaluator = client.evaluators.get(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
)

print(evaluator.id, evaluator.name)
print(evaluator.version)
```

### Get a Specific Version

```python theme={null}
evaluator = client.evaluators.get(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    version_id="specific-version-id",
)
```

## Update an Evaluator

Update an evaluator's metadata (name and/or description). To change the template configuration, create a new version instead.

```python theme={null}
evaluator = client.evaluators.update(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    name="Relevance v2",
    description="Updated description",
)

print(evaluator)
```

## Delete an Evaluator

Delete an evaluator and all its versions. This operation is irreversible. There is no response from this call.

```python theme={null}
client.evaluators.delete(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
)

print("Evaluator deleted successfully")
```

## Manage Versions

Evaluator versions are immutable once created. To change the template configuration, create a new version — it becomes the latest version immediately.

### List Versions

List all versions for an evaluator.

```python theme={null}
resp = client.evaluators.list_versions(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    limit=50,
)

for version in resp.evaluator_versions:
    print(version.id, version.commit_message)
```

For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see [Response Objects](/api-clients/python/version-8/overview#response-objects).

### Get a Version

Retrieve a specific evaluator version by its ID.

```python theme={null}
version = client.evaluators.get_version(version_id="your-version-id")

print(version.id, version.commit_message)
```

### Create a New Template Version

Add a new template version to an existing template evaluator. The new version becomes the latest immediately.

```python theme={null}
from arize.evaluators.types import TemplateConfig, EvaluatorLlmConfig

version = client.evaluators.create_template_version(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    commit_message="Improved prompt for edge cases",
    template_config=TemplateConfig(
        name="Relevance",
        template="Rate the relevance of the response on a scale of 0 to 1.\nQuery: {input.value}\nResponse: {output.value}",
        include_explanations=True,
        use_function_calling_if_available=True,
        classification_choices={"relevant": 1, "irrelevant": 0},
        direction="maximize",
        llm_config=EvaluatorLlmConfig(
            ai_integration_id="your-ai-integration-id",
            model_name="gpt-4o",
            invocation_parameters={"temperature": 0},
            provider_parameters={},
        ),
    ),
)

print(version.id)
```

### Create a New Code Version

Add a new code version to an existing code evaluator.

```python theme={null}
from arize.evaluators.types import ManagedCodeConfig

version = client.evaluators.create_code_version(
    evaluator="your-evaluator-name-or-id",
    space="your-space-name-or-id",  # required when resolving by evaluator name
    commit_message="Updated managed evaluator",
    code_config=ManagedCodeConfig(
        type="managed",
        name="json_parseable",
        managed_evaluator="JSONParseable",
        variables=["output"],
    ),
)

print(version.id)
```

**Learn more:** [Online Evaluations Documentation](https://arize.com/docs/ax/evaluate/online-evals)