Documentation Index
Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
The ax tasks commands are currently in ALPHA. The API may change without notice. A one-time warning is emitted on first use.
The ax tasks commands let you create and manage evaluation tasks and their runs on the Arize platform. Tasks automatically score spans in a project or evaluate experiment results using your LLM-as-judge evaluators.
ax tasks list
List evaluation tasks, optionally filtered by space, project, dataset, or type.
ax tasks list [--space <id>] [--project <id>] [--dataset <id>] [--name <filter>] [--task-type <type>] [--limit <n>] [--cursor <cursor>]
| Option | Description |
|---|
--space | Filter tasks by space name or ID |
--project | Filter tasks by project name or ID |
--dataset | Filter tasks by dataset name or ID |
--name | Case-insensitive substring filter on task name |
--task-type | Filter by type: template_evaluation, code_evaluation, or run_experiment |
--limit | Maximum number of results to return (default: 15) |
--cursor | Pagination cursor for the next page |
Examples:
ax tasks list --space sp_abc123
ax tasks list --space sp_abc123 --task-type template_evaluation
ax tasks list --project proj_abc123 --output tasks.json
ax tasks create
Create a new task. Dispatches internally based on --task-type.
For evaluation tasks (template_evaluation or code_evaluation), either --project or --dataset must be provided, but not both. Run-experiment tasks (run_experiment) require --dataset and --run-configuration.
ax tasks create \
--name <name> \
--task-type <type> \
[--evaluators <json-array>] \
[--run-configuration <json>] \
(--project <name-or-id> | --dataset <name-or-id>)
| Option | Description |
|---|
--name | Task name (must be unique within the space) |
--task-type | template_evaluation, code_evaluation, or run_experiment |
--evaluators | JSON array of evaluator objects (required for evaluation tasks; see format below) |
--run-configuration | JSON object (or @file.json) specifying the run configuration (required for run_experiment tasks) |
--project | Target project name or ID; mutually exclusive with --dataset (evaluation tasks only) |
--space | Space name or ID (required when resolving --project or --dataset by name) |
--dataset | Target dataset name or ID; mutually exclusive with --project for evaluation tasks; required for run_experiment tasks |
--experiment-ids | Comma-separated experiment global IDs (evaluation tasks only) |
--sampling-rate | Fraction of spans to evaluate, 0–1 (project evaluation tasks only) |
--is-continuous / --no-continuous | Run task continuously on incoming data (evaluation tasks only) |
--query-filter | Task-level SQL-style filter applied to all evaluators (evaluation tasks only) |
Evaluators JSON format:
[
{
"evaluator_id": "ev_abc123",
"query_filter": null,
"column_mappings": null
}
]
Run configuration JSON format (run_experiment tasks):
{
"experiment_type": "llm_generation",
"ai_integration_id": "...",
"model_name": "gpt-4o",
"messages": [{"role": "user", "content": "{{input}}"}]
}
Examples:
Project-based evaluation task (continuous):
ax tasks create \
--name "Relevance Monitor" \
--task-type template_evaluation \
--project proj_abc123 \
--evaluators '[{"evaluator_id": "ev_abc123"}]' \
--is-continuous \
--sampling-rate 0.1
Dataset-based evaluation task:
ax tasks create \
--name "Experiment Evaluation" \
--task-type template_evaluation \
--dataset ds_xyz789 \
--experiment-ids "exp_abc123,exp_def456" \
--evaluators '[{"evaluator_id": "ev_abc123"}]' \
--no-continuous
Run-experiment task:
ax tasks create \
--name "GPT-4o Summarization" \
--task-type run_experiment \
--dataset ds_xyz789 \
--run-configuration '{"experiment_type": "llm_generation", "ai_integration_id": "ai_abc", "model_name": "gpt-4o", "messages": [{"role": "user", "content": "{{input}}"}]}'
ax tasks create-evaluation
Create a new evaluation task (template_evaluation or code_evaluation). Requires --name, --task-type, --evaluators, and one of --project / --dataset.
ax tasks create-evaluation \
--name <name> \
--task-type <type> \
--evaluators <json-array> \
(--project <name-or-id> | --dataset <name-or-id>)
| Option | Description |
|---|
--name | Task name (must be unique within the space) |
--task-type | template_evaluation or code_evaluation |
--evaluators | JSON array of evaluator objects (see format above) |
--project | Target project name or ID; mutually exclusive with --dataset |
--space | Space name or ID (required when using a project name) |
--dataset | Target dataset name or ID; mutually exclusive with --project |
--experiment-ids | Comma-separated experiment global IDs (required for dataset-based tasks) |
--sampling-rate | Fraction of data to evaluate, 0–1 (project tasks only) |
--is-continuous / --no-continuous | Run task continuously on incoming data |
--query-filter | Task-level query filter applied to all evaluators |
Example:
ax tasks create-evaluation \
--name "Relevance Monitor" \
--task-type template_evaluation \
--project proj_abc123 \
--evaluators '[{"evaluator_id": "ev_abc123"}]' \
--sampling-rate 0.1 \
--is-continuous
ax tasks create-run-experiment
Create a new run_experiment task. Requires --name, --dataset, and --run-configuration.
ax tasks create-run-experiment \
--name <name> \
--dataset <name-or-id> \
--run-configuration <json>
| Option | Description |
|---|
--name | Task name (must be unique within the space) |
--dataset | Dataset name or ID to run experiments against |
--run-configuration | JSON object (or @file.json) specifying the run configuration |
--space | Space name or ID |
Example:
ax tasks create-run-experiment \
--name "GPT-4o Summarization" \
--dataset ds_xyz789 \
--run-configuration @./run_config.json
ax tasks get
Get a task by name or ID.
ax tasks get <name-or-id>
Example:
ax tasks update
Update mutable fields on an existing task. The SDK auto-dispatches based on the task’s type; providing a field invalid for the resolved task type raises an error. At least one field must be provided.
ax tasks update <name-or-id> [--space <id>] [--name <name>] [--sampling-rate <n>] [--is-continuous|--no-continuous] [--query-filter <expr>] [--evaluators <json>] [--run-configuration <json>]
| Option | Description |
|---|
--space, -s | Space name or ID (required when resolving task by name) |
--name, -n | New task display name |
--sampling-rate | Sampling rate between 0 and 1 (evaluation tasks only) |
--is-continuous / --no-continuous | Whether the task runs continuously (evaluation tasks only) |
--query-filter | Task-level query filter (evaluation tasks only). Pass --query-filter "" to clear the existing filter. |
--evaluators | JSON array replacing the full evaluator list (evaluation tasks only; same shape as ax tasks create --evaluators) |
--run-configuration | JSON object (or @file.json) replacing the run configuration (run_experiment tasks only). The entire stored config is atomically replaced. |
Example:
ax tasks update task_abc123 --name "Relevance Monitor v2" --sampling-rate 0.25
ax tasks delete
Delete a task and its associated configuration. This operation is irreversible.
ax tasks delete <name-or-id> [--space <id>] [--force]
| Option | Description |
|---|
--space, -s | Space name or ID (required when resolving task by name) |
--force, -f | Skip the confirmation prompt |
Example:
ax tasks delete task_abc123 --force
ax tasks trigger-run
Trigger an on-demand run for a task. The run starts in pending status. The SDK auto-dispatches based on the task’s type; providing a flag invalid for the resolved task type raises an error. Pass --wait to block until the run reaches a terminal state.
ax tasks trigger-run <task-id> [--data-start-time <time>] [--data-end-time <time>] [--max-spans <n>] [--override-evaluations] [--experiment-ids <ids>] [--experiment-name <name>] [--dataset-version-id <id>] [--max-examples <n>] [--tracing-metadata <json>] [--wait] [--poll-interval <s>] [--timeout <s>]
| Option | Description |
|---|
--data-start-time | ISO 8601 start of the data window to evaluate (evaluation tasks only) |
--data-end-time | ISO 8601 end of the data window (evaluation tasks only, defaults to now) |
--max-spans | Maximum number of spans to process (evaluation tasks only, default: 10 000) |
--override-evaluations / --no-override-evaluations | Re-evaluate data that already has labels (evaluation tasks only) |
--experiment-ids | Comma-separated experiment global IDs (dataset-based evaluation tasks only) |
--experiment-name | Display name for the experiment to be created (required for run_experiment tasks) |
--dataset-version-id | Dataset version global ID (base64); defaults to the latest version (run_experiment tasks only) |
--max-examples | Maximum number of examples to run (run_experiment tasks only) |
--tracing-metadata | JSON object (or @file.json) of key/value pairs attached to experiment traces (run_experiment tasks only) |
--wait / -w | Block until the run reaches a terminal state |
--poll-interval | Seconds between polling attempts when using --wait (default: 5) |
--timeout | Maximum seconds to wait when using --wait (default: 600) |
Examples:
# Trigger a run and return immediately
ax tasks trigger-run task_abc123
# Trigger a run over a specific time window
ax tasks trigger-run task_abc123 \
--data-start-time 2024-01-01T00:00:00Z \
--data-end-time 2024-02-01T00:00:00Z
# Trigger a run and wait for it to finish
ax tasks trigger-run task_abc123 --wait
# Trigger and wait with a custom timeout
ax tasks trigger-run task_abc123 --wait --timeout 300 --poll-interval 10
ax tasks list-runs
List runs for a task, with optional status filtering.
ax tasks list-runs <task-id> [--status <status>] [--limit <n>] [--cursor <cursor>]
| Option | Description |
|---|
--status | Filter by run status: pending, running, completed, failed, cancelled |
--limit | Maximum number of results to return (default: 15) |
--cursor | Pagination cursor for the next page |
Examples:
ax tasks list-runs task_abc123
ax tasks list-runs task_abc123 --status completed
ax tasks list-runs task_abc123 --status failed --output runs.json
ax tasks get-run
Get a task run by its global ID.
ax tasks get-run <run-id>
Example:
ax tasks get-run run_abc123
ax tasks cancel-run
Cancel a task run. Only valid when the run is pending or running.
ax tasks cancel-run <run-id> [--force]
| Option | Description |
|---|
--force | Skip the confirmation prompt |
Examples:
ax tasks cancel-run run_abc123
ax tasks cancel-run run_abc123 --force
ax tasks wait-for-run
Poll a task run until it reaches a terminal state (completed, failed, or cancelled). Exits with an error if the run does not complete within the timeout.
ax tasks wait-for-run <run-id> [--poll-interval <s>] [--timeout <s>]
| Option | Description |
|---|
--poll-interval | Seconds between polling attempts (default: 5) |
--timeout | Maximum seconds to wait before failing (default: 600) |
Example:
ax tasks wait-for-run run_abc123
ax tasks wait-for-run run_abc123 --timeout 300 --poll-interval 10