> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Tasks

> Create and manage evaluation tasks that automatically score your LLM application on a schedule or on demand.

<Note>
  The `tasks` client methods are currently in **ALPHA**. The API may change without notice. A one-time warning is emitted on first use.
</Note>

Create evaluation tasks that continuously or on-demand score spans in a project, or evaluate examples in a dataset using your LLM-as-judge evaluators.

## Key Capabilities

* Create project-based tasks that run continuously against live spans
* Create dataset-based tasks that evaluate experiment results
* Create `run_experiment` tasks that drive LLM calls on the server
* Trigger on-demand task runs with custom data windows
* Poll task runs until completion with configurable timeout
* Cancel in-progress runs
* List and filter task runs by status

## List Tasks

List tasks you have access to, with optional filtering by space, project, dataset, or type.

```python theme={null}
resp = client.tasks.list(
    space="your-space-name-or-id",  # optional
    limit=50,
)

for task in resp.tasks:
    print(task.id, task.name)
```

Filter by task type:

```python theme={null}
resp = client.tasks.list(
    space="your-space-name-or-id",
    task_type="template_evaluation",
)
```

Valid values for `task_type` are `"template_evaluation"`, `"code_evaluation"`, and `"run_experiment"`.

For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see [Response Objects](/api-clients/python/version-8/overview#response-objects).

## Create an Evaluation Task

Create a new evaluation task. Evaluation tasks can target either a **project** (live spans) or a **dataset** (experiment results).

### Project-Based Task

A project-based task continuously evaluates incoming spans. Set `is_continuous=True` to run the task on every new span, or `False` to run it only on demand.

```python theme={null}
from arize.tasks.types import BaseEvaluationTaskRequestEvaluatorsInner

task = client.tasks.create_evaluation_task(
    name="Relevance Monitor",
    task_type="template_evaluation",
    project="your-project-name-or-id",
    evaluators=[
        BaseEvaluationTaskRequestEvaluatorsInner(
            evaluator_id="your-evaluator-id",
        ),
    ],
    is_continuous=True,
    sampling_rate=0.1,  # Evaluate 10% of spans
)

print(task.id)
```

### Dataset-Based Task

A dataset-based task evaluates examples from one or more experiments. At least one `experiment_ids` entry is required.

```python theme={null}
task = client.tasks.create_evaluation_task(
    name="Experiment Evaluation",
    task_type="template_evaluation",
    dataset="your-dataset-name-or-id",
    experiment_ids=["experiment-id-1", "experiment-id-2"],
    evaluators=[
        BaseEvaluationTaskRequestEvaluatorsInner(
            evaluator_id="your-evaluator-id",
        ),
    ],
    is_continuous=False,
)

print(task.id)
```

### Column Mappings and Filters

Each evaluator in the task can have its own column mappings (to map template variables to span attribute names) and a per-evaluator query filter.

```python theme={null}
task = client.tasks.create_evaluation_task(
    name="Custom Relevance",
    task_type="template_evaluation",
    project="your-project-name-or-id",
    evaluators=[
        BaseEvaluationTaskRequestEvaluatorsInner(
            evaluator_id="your-evaluator-id",
            column_mappings={"user_query": "input.value"},
            query_filter="status_code = 'OK'",
        ),
    ],
    query_filter="latency_ms < 5000",  # Task-level filter (AND-ed with evaluator filter)
    is_continuous=True,
)
```

**Parameter reference:**

| Parameter        | Type        | Description                                                                              |
| ---------------- | ----------- | ---------------------------------------------------------------------------------------- |
| `name`           | `str`       | Task name. Must be unique within the space.                                              |
| `task_type`      | `str`       | `"template_evaluation"` or `"code_evaluation"`.                                          |
| `evaluators`     | `list`      | List of evaluators to attach. At least one required.                                     |
| `project`        | `str`       | Target project name or ID. Required when `dataset` is not provided.                      |
| `dataset`        | `str`       | Target dataset name or ID. Required when `project` is not provided.                      |
| `space`          | `str`       | Space name or ID used to disambiguate name-based resolution for `project` and `dataset`. |
| `experiment_ids` | `list[str]` | Required (at least one) when `dataset` is provided.                                      |
| `sampling_rate`  | `float`     | Fraction of spans to evaluate (0–1). Project-based tasks only.                           |
| `is_continuous`  | `bool`      | `True` to run on every new span; `False` for on-demand only.                             |
| `query_filter`   | `str`       | Task-level SQL-style filter applied to all evaluators.                                   |

## Create a Run-Experiment Task

A `run_experiment` task drives all LLM calls on the server using the AI integration specified in `run_configuration` — no local callable is required.

```python theme={null}
from arize.tasks.types import LlmGenerationRunConfig

task = client.tasks.create_run_experiment_task(
    name="Nightly QA Run",
    dataset="your-dataset-name-or-id",
    space="your-space-name-or-id",  # required when dataset is a name
    run_configuration=LlmGenerationRunConfig(
        # provider/model/prompt configuration for the server-driven run
        # ...
    ),
)

print(task.id)
```

The method also accepts a `TemplateEvaluationRunConfig` instance or a plain `dict` matching one of those schemas; the SDK wraps it for you.

## Get a Task

Retrieve a task by name or ID. When using a name, provide `space` to disambiguate.

```python theme={null}
task = client.tasks.get(
    task="your-task-name-or-id",
    space="your-space-name-or-id",  # required when using a name
)

print(task.id, task.name)
```

## Update a Task

Update mutable fields on an existing task. At least one update field must be provided. Pass `query_filter=None` to clear the existing filter; omit any other argument to leave it unchanged.

```python theme={null}
task = client.tasks.update(
    task="your-task-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    name="Relevance Monitor v2",
    sampling_rate=0.25,  # project-based tasks only
)

print(task.id, task.name)
```

## Delete a Task

Delete a task and its associated configuration. This operation is irreversible.

```python theme={null}
client.tasks.delete(
    task="your-task-name-or-id",
    space="your-space-name-or-id",  # required when using a name
)

print("Task deleted successfully")
```

## Task Runs

### Trigger a Run

Trigger an on-demand run for a task. The run starts in `"pending"` status. The accepted parameters depend on the task's type.

**Evaluation tasks** (`template_evaluation` / `code_evaluation`):

```python theme={null}
from datetime import datetime

run = client.tasks.trigger_run(
    task="your-task-name-or-id",
    data_start_time=datetime(2024, 1, 1),
    data_end_time=datetime(2024, 2, 1),
)

print(run.id, run.status)  # e.g. "run-abc123", "pending"
```

| Parameter              | Type        | Default  | Description                                                                                |
| ---------------------- | ----------- | -------- | ------------------------------------------------------------------------------------------ |
| `task`                 | `str`       | required | Task name or ID to trigger.                                                                |
| `space`                | `str`       | None     | Space name or ID used to disambiguate the task lookup. Recommended when resolving by name. |
| `data_start_time`      | `datetime`  | None     | Start of data window to evaluate.                                                          |
| `data_end_time`        | `datetime`  | now      | End of data window. Defaults to the current time.                                          |
| `max_spans`            | `int`       | 10 000   | Maximum number of spans to process.                                                        |
| `override_evaluations` | `bool`      | `False`  | Re-evaluate data that already has labels.                                                  |
| `experiment_ids`       | `list[str]` | None     | Experiment IDs to run against (dataset-based tasks only).                                  |

**`run_experiment` tasks**:

```python theme={null}
run = client.tasks.trigger_run(
    task="your-run-experiment-task",
    experiment_name="qa-run-2024-01-15",  # required: display name for the experiment
    max_examples=100,                     # optional cap
)
```

| Parameter             | Type             | Default  | Description                                                                                                    |
| --------------------- | ---------------- | -------- | -------------------------------------------------------------------------------------------------------------- |
| `task`                | `str`            | required | Task name or ID to trigger.                                                                                    |
| `space`               | `str`            | None     | Space name or ID used to disambiguate the task lookup.                                                         |
| `experiment_name`     | `str`            | required | Display name for the experiment to be created. Must be unique within the dataset.                              |
| `dataset_version_id`  | `str`            | latest   | Dataset version global ID. Defaults to the latest version.                                                     |
| `max_examples`        | `int`            | None     | Maximum number of examples to run. When omitted, all examples are used. Mutually exclusive with `example_ids`. |
| `example_ids`         | `list[str]`      | None     | Specific dataset example global IDs to run against. Mutually exclusive with `max_examples`.                    |
| `tracing_metadata`    | `dict[str, Any]` | None     | Arbitrary key-value metadata attached to the run's traces.                                                     |
| `evaluation_task_ids` | `list[str]`      | None     | Task global IDs of evaluation tasks to trigger after the experiment run completes.                             |

### List Runs

List runs for a task with optional status filtering.

```python theme={null}
resp = client.tasks.list_runs(
    task="your-task-name-or-id",
    limit=20,
)

for run in resp.task_runs:
    print(run.id, run.status)
```

Filter to only completed runs:

```python theme={null}
resp = client.tasks.list_runs(
    task="your-task-name-or-id",
    status="completed",
)
```

Valid `status` values: `"pending"`, `"running"`, `"completed"`, `"failed"`, `"cancelled"`.

### Get a Run

Retrieve a specific run by its ID.

```python theme={null}
run = client.tasks.get_run(run_id="your-run-id")

print(run.id, run.status)
```

### Cancel a Run

Cancel a run that is currently `"pending"` or `"running"`.

```python theme={null}
run = client.tasks.cancel_run(run_id="your-run-id")

print(run.status)  # "cancelled"
```

### Wait for a Run

Poll a run until it reaches a terminal state (`"completed"`, `"failed"`, or `"cancelled"`).

```python theme={null}
run = client.tasks.wait_for_run(
    run_id="your-run-id",
    poll_interval=5,   # Check every 5 seconds (default)
    timeout=600,       # Give up after 10 minutes (default)
)

print(run.status)  # "completed", "failed", or "cancelled"
```

**Raises `TimeoutError`** if the run does not complete within `timeout` seconds.

### End-to-End: Trigger and Wait

```python theme={null}
# Trigger an on-demand run
run = client.tasks.trigger_run(task="your-task-name-or-id")

# Block until the run finishes
run = client.tasks.wait_for_run(run_id=run.id)

if run.status == "completed":
    print("Task run completed successfully")
elif run.status == "failed":
    print("Task run failed")
```

**Learn more:** [Online Evaluations Documentation](https://arize.com/docs/ax/evaluate/online-evals)
