> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Tasks

> Create and manage evaluation tasks with the AX CLI

<Note>
  The `ax tasks` commands are currently in **ALPHA**. The API may change without notice. A one-time warning is emitted on first use.
</Note>

The `ax tasks` commands let you create and manage evaluation tasks and their runs on the Arize platform. Tasks automatically score spans in a project or evaluate experiment results using your LLM-as-judge evaluators.

## `ax tasks list`

List evaluation tasks, optionally filtered by space, project, dataset, or type.

```bash theme={null}
ax tasks list [--space <id>] [--project <id>] [--dataset <id>] [--name <filter>] [--task-type <type>] [--limit <n>] [--cursor <cursor>]
```

| Option        | Description                                                                   |
| ------------- | ----------------------------------------------------------------------------- |
| `--space`     | Filter tasks by space name or ID                                              |
| `--project`   | Filter tasks by project name or ID                                            |
| `--dataset`   | Filter tasks by dataset name or ID                                            |
| `--name`      | Case-insensitive substring filter on task name                                |
| `--task-type` | Filter by type: `template_evaluation`, `code_evaluation`, or `run_experiment` |
| `--limit`     | Maximum number of results to return (default: 15)                             |
| `--cursor`    | Pagination cursor for the next page                                           |

**Examples:**

```bash theme={null}
ax tasks list --space sp_abc123
ax tasks list --space sp_abc123 --task-type template_evaluation
ax tasks list --project proj_abc123 --output tasks.json
```

## `ax tasks create`

Create a new task. Dispatches internally based on `--task-type`.

For evaluation tasks (`template_evaluation` or `code_evaluation`), either `--project` or `--dataset` must be provided, but not both. Run-experiment tasks (`run_experiment`) require `--dataset` and `--run-configuration`.

```bash theme={null}
ax tasks create \
  --name <name> \
  --task-type <type> \
  [--evaluators <json-array>] \
  [--run-configuration <json>] \
  (--project <name-or-id> | --dataset <name-or-id>)
```

| Option                              | Description                                                                                                              |
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `--name`                            | Task name (must be unique within the space)                                                                              |
| `--task-type`                       | `template_evaluation`, `code_evaluation`, or `run_experiment`                                                            |
| `--evaluators`                      | JSON array of evaluator objects (required for evaluation tasks; see format below)                                        |
| `--run-configuration`               | JSON object (or `@file.json`) specifying the run configuration (required for `run_experiment` tasks)                     |
| `--project`                         | Target project name or ID; mutually exclusive with `--dataset` (evaluation tasks only)                                   |
| `--space`                           | Space name or ID (required when resolving `--project` or `--dataset` by name)                                            |
| `--dataset`                         | Target dataset name or ID; mutually exclusive with `--project` for evaluation tasks; required for `run_experiment` tasks |
| `--experiment-ids`                  | Comma-separated experiment global IDs (evaluation tasks only)                                                            |
| `--sampling-rate`                   | Fraction of spans to evaluate, 0–1 (project evaluation tasks only)                                                       |
| `--is-continuous / --no-continuous` | Run task continuously on incoming data (evaluation tasks only)                                                           |
| `--query-filter`                    | Task-level SQL-style filter applied to all evaluators (evaluation tasks only)                                            |

**Evaluators JSON format:**

```json theme={null}
[
  {
    "evaluator_id": "ev_abc123",
    "query_filter": null,
    "column_mappings": null
  }
]
```

**Run configuration JSON format (run\_experiment tasks):**

```json theme={null}
{
  "experiment_type": "llm_generation",
  "ai_integration_id": "...",
  "model_name": "gpt-4o",
  "messages": [{"role": "user", "content": "{{input}}"}]
}
```

**Examples:**

Project-based evaluation task (continuous):

```bash theme={null}
ax tasks create \
  --name "Relevance Monitor" \
  --task-type template_evaluation \
  --project proj_abc123 \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --is-continuous \
  --sampling-rate 0.1
```

Dataset-based evaluation task:

```bash theme={null}
ax tasks create \
  --name "Experiment Evaluation" \
  --task-type template_evaluation \
  --dataset ds_xyz789 \
  --experiment-ids "exp_abc123,exp_def456" \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --no-continuous
```

Run-experiment task:

```bash theme={null}
ax tasks create \
  --name "GPT-4o Summarization" \
  --task-type run_experiment \
  --dataset ds_xyz789 \
  --run-configuration '{"experiment_type": "llm_generation", "ai_integration_id": "ai_abc", "model_name": "gpt-4o", "messages": [{"role": "user", "content": "{{input}}"}]}'
```

## `ax tasks create-evaluation`

Create a new evaluation task (`template_evaluation` or `code_evaluation`). Requires `--name`, `--task-type`, `--evaluators`, and one of `--project` / `--dataset`.

```bash theme={null}
ax tasks create-evaluation \
  --name <name> \
  --task-type <type> \
  --evaluators <json-array> \
  (--project <name-or-id> | --dataset <name-or-id>)
```

| Option                              | Description                                                              |
| ----------------------------------- | ------------------------------------------------------------------------ |
| `--name`                            | Task name (must be unique within the space)                              |
| `--task-type`                       | `template_evaluation` or `code_evaluation`                               |
| `--evaluators`                      | JSON array of evaluator objects (see format above)                       |
| `--project`                         | Target project name or ID; mutually exclusive with `--dataset`           |
| `--space`                           | Space name or ID (required when using a project name)                    |
| `--dataset`                         | Target dataset name or ID; mutually exclusive with `--project`           |
| `--experiment-ids`                  | Comma-separated experiment global IDs (required for dataset-based tasks) |
| `--sampling-rate`                   | Fraction of data to evaluate, 0–1 (project tasks only)                   |
| `--is-continuous / --no-continuous` | Run task continuously on incoming data                                   |
| `--query-filter`                    | Task-level query filter applied to all evaluators                        |

**Example:**

```bash theme={null}
ax tasks create-evaluation \
  --name "Relevance Monitor" \
  --task-type template_evaluation \
  --project proj_abc123 \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --sampling-rate 0.1 \
  --is-continuous
```

## `ax tasks create-run-experiment`

Create a new `run_experiment` task. Requires `--name`, `--dataset`, and `--run-configuration`.

```bash theme={null}
ax tasks create-run-experiment \
  --name <name> \
  --dataset <name-or-id> \
  --run-configuration <json>
```

| Option                | Description                                                    |
| --------------------- | -------------------------------------------------------------- |
| `--name`              | Task name (must be unique within the space)                    |
| `--dataset`           | Dataset name or ID to run experiments against                  |
| `--run-configuration` | JSON object (or `@file.json`) specifying the run configuration |
| `--space`             | Space name or ID                                               |

**Example:**

```bash theme={null}
ax tasks create-run-experiment \
  --name "GPT-4o Summarization" \
  --dataset ds_xyz789 \
  --run-configuration @./run_config.json
```

## `ax tasks get`

Get a task by name or ID.

```bash theme={null}
ax tasks get <name-or-id>
```

**Example:**

```bash theme={null}
ax tasks get task_abc123
```

## `ax tasks update`

Update mutable fields on an existing task. The SDK auto-dispatches based on the task's type; providing a field invalid for the resolved task type raises an error. At least one field must be provided.

```bash theme={null}
ax tasks update <name-or-id> [--space <id>] [--name <name>] [--sampling-rate <n>] [--is-continuous|--no-continuous] [--query-filter <expr>] [--evaluators <json>] [--run-configuration <json>]
```

| Option                                | Description                                                                                                                                   |
| ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| `--space`, `-s`                       | Space name or ID (required when resolving task by name)                                                                                       |
| `--name`, `-n`                        | New task display name                                                                                                                         |
| `--sampling-rate`                     | Sampling rate between 0 and 1 (evaluation tasks only)                                                                                         |
| `--is-continuous` / `--no-continuous` | Whether the task runs continuously (evaluation tasks only)                                                                                    |
| `--query-filter`                      | Task-level query filter (evaluation tasks only). Pass `--query-filter ""` to clear the existing filter.                                       |
| `--evaluators`                        | JSON array replacing the full evaluator list (evaluation tasks only; same shape as `ax tasks create --evaluators`)                            |
| `--run-configuration`                 | JSON object (or `@file.json`) replacing the run configuration (`run_experiment` tasks only). The entire stored config is atomically replaced. |

**Example:**

```bash theme={null}
ax tasks update task_abc123 --name "Relevance Monitor v2" --sampling-rate 0.25
```

## `ax tasks delete`

Delete a task and its associated configuration. This operation is irreversible.

```bash theme={null}
ax tasks delete <name-or-id> [--space <id>] [--force]
```

| Option          | Description                                             |
| --------------- | ------------------------------------------------------- |
| `--space`, `-s` | Space name or ID (required when resolving task by name) |
| `--force`, `-f` | Skip the confirmation prompt                            |

**Example:**

```bash theme={null}
ax tasks delete task_abc123 --force
```

## `ax tasks trigger-run`

Trigger an on-demand run for a task. The run starts in `pending` status. The SDK auto-dispatches based on the task's type; providing a flag invalid for the resolved task type raises an error. Pass `--wait` to block until the run reaches a terminal state.

```bash theme={null}
ax tasks trigger-run <task-id> [--data-start-time <time>] [--data-end-time <time>] [--max-spans <n>] [--override-evaluations] [--experiment-ids <ids>] [--example-ids <ids>] [--evaluation-task-ids <ids>] [--experiment-name <name>] [--dataset-version-id <id>] [--max-examples <n>] [--tracing-metadata <json>] [--wait] [--poll-interval <s>] [--timeout <s>]
```

| Option                                               | Description                                                                                                                        |
| ---------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `--data-start-time`                                  | ISO 8601 start of the data window to evaluate (evaluation tasks only)                                                              |
| `--data-end-time`                                    | ISO 8601 end of the data window (evaluation tasks only, defaults to now)                                                           |
| `--max-spans`                                        | Maximum number of spans to process (evaluation tasks only, default: 10 000)                                                        |
| `--override-evaluations / --no-override-evaluations` | Re-evaluate data that already has labels (evaluation tasks only)                                                                   |
| `--experiment-ids`                                   | Comma-separated experiment global IDs (dataset-based evaluation tasks only)                                                        |
| `--example-ids`                                      | Comma-separated dataset example global IDs to run against (`run_experiment` tasks only). Mutually exclusive with `--max-examples`. |
| `--evaluation-task-ids`                              | Comma-separated task global IDs of evaluation tasks to trigger after the experiment run completes (`run_experiment` tasks only)    |
| `--experiment-name`                                  | Display name for the experiment to be created (required for `run_experiment` tasks)                                                |
| `--dataset-version-id`                               | Dataset version global ID (base64); defaults to the latest version (`run_experiment` tasks only)                                   |
| `--max-examples`                                     | Maximum number of examples to run (`run_experiment` tasks only)                                                                    |
| `--tracing-metadata`                                 | JSON object (or `@file.json`) of key/value pairs attached to experiment traces (`run_experiment` tasks only)                       |
| `--wait` / `-w`                                      | Block until the run reaches a terminal state                                                                                       |
| `--poll-interval`                                    | Seconds between polling attempts when using `--wait` (default: 5)                                                                  |
| `--timeout`                                          | Maximum seconds to wait when using `--wait` (default: 600)                                                                         |

**Examples:**

```bash theme={null}
# Trigger a run and return immediately
ax tasks trigger-run task_abc123

# Trigger a run over a specific time window
ax tasks trigger-run task_abc123 \
  --data-start-time 2024-01-01T00:00:00Z \
  --data-end-time 2024-02-01T00:00:00Z

# Trigger a run and wait for it to finish
ax tasks trigger-run task_abc123 --wait

# Trigger and wait with a custom timeout
ax tasks trigger-run task_abc123 --wait --timeout 300 --poll-interval 10
```

## `ax tasks list-runs`

List runs for a task, with optional status filtering.

```bash theme={null}
ax tasks list-runs <task-id> [--status <status>] [--limit <n>] [--cursor <cursor>]
```

| Option     | Description                                                                    |
| ---------- | ------------------------------------------------------------------------------ |
| `--status` | Filter by run status: `pending`, `running`, `completed`, `failed`, `cancelled` |
| `--limit`  | Maximum number of results to return (default: 15)                              |
| `--cursor` | Pagination cursor for the next page                                            |

**Examples:**

```bash theme={null}
ax tasks list-runs task_abc123
ax tasks list-runs task_abc123 --status completed
ax tasks list-runs task_abc123 --status failed --output runs.json
```

## `ax tasks get-run`

Get a task run by its global ID.

```bash theme={null}
ax tasks get-run <run-id>
```

**Example:**

```bash theme={null}
ax tasks get-run run_abc123
```

## `ax tasks cancel-run`

Cancel a task run. Only valid when the run is `pending` or `running`.

```bash theme={null}
ax tasks cancel-run <run-id> [--force]
```

| Option    | Description                  |
| --------- | ---------------------------- |
| `--force` | Skip the confirmation prompt |

**Examples:**

```bash theme={null}
ax tasks cancel-run run_abc123
ax tasks cancel-run run_abc123 --force
```

## `ax tasks wait-for-run`

Poll a task run until it reaches a terminal state (`completed`, `failed`, or `cancelled`). Exits with an error if the run does not complete within the timeout.

```bash theme={null}
ax tasks wait-for-run <run-id> [--poll-interval <s>] [--timeout <s>]
```

| Option            | Description                                           |
| ----------------- | ----------------------------------------------------- |
| `--poll-interval` | Seconds between polling attempts (default: 5)         |
| `--timeout`       | Maximum seconds to wait before failing (default: 600) |

**Example:**

```bash theme={null}
ax tasks wait-for-run run_abc123
ax tasks wait-for-run run_abc123 --timeout 300 --poll-interval 10
```
