Skip to main content
The ax tasks commands let you create and manage evaluation tasks and their runs on the Arize platform. Tasks automatically score spans in a project or evaluate experiment results using your LLM-as-judge evaluators.

ax tasks list

List evaluation tasks, optionally filtered by space, project, dataset, or type.
ax tasks list [--space-id <id>] [--project-id <id>] [--dataset-id <id>] [--task-type <type>] [--limit <n>] [--cursor <cursor>]
OptionDescription
--space-idFilter tasks by space ID
--project-idFilter tasks by project global ID (base64)
--dataset-idFilter tasks by dataset global ID (base64)
--task-typeFilter by type: template_evaluation or code_evaluation
--limitMaximum number of results to return (default: 15)
--cursorPagination cursor for the next page
Examples:
ax tasks list --space-id sp_abc123
ax tasks list --space-id sp_abc123 --task-type template_evaluation
ax tasks list --project-id proj_abc123 --output tasks.json

ax tasks get

Get a task by its global ID.
ax tasks get <task-id>
Example:
ax tasks get task_abc123

ax tasks create

Create a new evaluation task. Either --project-id or --dataset-id must be provided, but not both. Required options will be prompted interactively if not passed as flags.
ax tasks create \
  --name <name> \
  --task-type <type> \
  --evaluators <json-array> \
  (--project-id <id> | --dataset-id <id>)
OptionDescription
--nameTask name (must be unique within the space)
--task-typetemplate_evaluation or code_evaluation
--evaluatorsJSON array of evaluator objects (see format below)
--project-idTarget project global ID; mutually exclusive with --dataset-id
--dataset-idTarget dataset global ID; mutually exclusive with --project-id
--experiment-idsComma-separated experiment global IDs (required for dataset-based tasks)
--sampling-rateFraction of spans to evaluate, 0–1 (project-based tasks only)
--is-continuous / --no-continuousRun task continuously on incoming data
--query-filterTask-level SQL-style filter applied to all evaluators
Evaluators JSON format:
[
  {
    "evaluator_id": "ev_abc123",
    "query_filter": null,
    "column_mappings": null
  }
]
Examples: Project-based task (continuous):
ax tasks create \
  --name "Relevance Monitor" \
  --task-type template_evaluation \
  --project-id proj_abc123 \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --is-continuous \
  --sampling-rate 0.1
Dataset-based task:
ax tasks create \
  --name "Experiment Evaluation" \
  --task-type template_evaluation \
  --dataset-id ds_xyz789 \
  --experiment-ids "exp_abc123,exp_def456" \
  --evaluators '[{"evaluator_id": "ev_abc123"}]' \
  --no-continuous

ax tasks trigger-run

Trigger an on-demand run for a task. The run starts in pending status. Pass --wait to block until the run reaches a terminal state.
ax tasks trigger-run <task-id> [--data-start-time <time>] [--data-end-time <time>] [--max-spans <n>] [--override-evaluations] [--experiment-ids <ids>] [--wait] [--poll-interval <s>] [--timeout <s>]
OptionDescription
--data-start-timeISO 8601 start of the data window to evaluate
--data-end-timeISO 8601 end of the data window (defaults to now)
--max-spansMaximum number of spans to process (default: 10 000)
--override-evaluations / --no-override-evaluationsRe-evaluate data that already has labels
--experiment-idsComma-separated experiment global IDs (dataset-based tasks only)
--wait / -wBlock until the run reaches a terminal state
--poll-intervalSeconds between polling attempts when using --wait (default: 5)
--timeoutMaximum seconds to wait when using --wait (default: 600)
Examples:
# Trigger a run and return immediately
ax tasks trigger-run task_abc123

# Trigger a run over a specific time window
ax tasks trigger-run task_abc123 \
  --data-start-time 2024-01-01T00:00:00Z \
  --data-end-time 2024-02-01T00:00:00Z

# Trigger a run and wait for it to finish
ax tasks trigger-run task_abc123 --wait

# Trigger and wait with a custom timeout
ax tasks trigger-run task_abc123 --wait --timeout 300 --poll-interval 10

ax tasks list-runs

List runs for a task, with optional status filtering.
ax tasks list-runs <task-id> [--status <status>] [--limit <n>] [--cursor <cursor>]
OptionDescription
--statusFilter by run status: pending, running, completed, failed, cancelled
--limitMaximum number of results to return (default: 15)
--cursorPagination cursor for the next page
Examples:
ax tasks list-runs task_abc123
ax tasks list-runs task_abc123 --status completed
ax tasks list-runs task_abc123 --status failed --output runs.json

ax tasks get-run

Get a task run by its global ID.
ax tasks get-run <run-id>
Example:
ax tasks get-run run_abc123

ax tasks cancel-run

Cancel a task run. Only valid when the run is pending or running.
ax tasks cancel-run <run-id> [--force]
OptionDescription
--forceSkip the confirmation prompt
Examples:
ax tasks cancel-run run_abc123
ax tasks cancel-run run_abc123 --force

ax tasks wait-for-run

Poll a task run until it reaches a terminal state (completed, failed, or cancelled). Exits with an error if the run does not complete within the timeout.
ax tasks wait-for-run <run-id> [--poll-interval <s>] [--timeout <s>]
OptionDescription
--poll-intervalSeconds between polling attempts (default: 5)
--timeoutMaximum seconds to wait before failing (default: 600)
Example:
ax tasks wait-for-run run_abc123
ax tasks wait-for-run run_abc123 --timeout 300 --poll-interval 10