ax tasks commands let you create and manage evaluation tasks and their runs on the Arize platform. Tasks automatically score spans in a project or evaluate experiment results using your LLM-as-judge evaluators.
ax tasks list
List evaluation tasks, optionally filtered by space, project, dataset, or type.
| Option | Description |
|---|---|
--space-id | Filter tasks by space ID |
--project-id | Filter tasks by project global ID (base64) |
--dataset-id | Filter tasks by dataset global ID (base64) |
--task-type | Filter by type: template_evaluation or code_evaluation |
--limit | Maximum number of results to return (default: 15) |
--cursor | Pagination cursor for the next page |
ax tasks get
Get a task by its global ID.
ax tasks create
Create a new evaluation task. Either --project-id or --dataset-id must be provided, but not both. Required options will be prompted interactively if not passed as flags.
| Option | Description |
|---|---|
--name | Task name (must be unique within the space) |
--task-type | template_evaluation or code_evaluation |
--evaluators | JSON array of evaluator objects (see format below) |
--project-id | Target project global ID; mutually exclusive with --dataset-id |
--dataset-id | Target dataset global ID; mutually exclusive with --project-id |
--experiment-ids | Comma-separated experiment global IDs (required for dataset-based tasks) |
--sampling-rate | Fraction of spans to evaluate, 0–1 (project-based tasks only) |
--is-continuous / --no-continuous | Run task continuously on incoming data |
--query-filter | Task-level SQL-style filter applied to all evaluators |
ax tasks trigger-run
Trigger an on-demand run for a task. The run starts in pending status. Pass --wait to block until the run reaches a terminal state.
| Option | Description |
|---|---|
--data-start-time | ISO 8601 start of the data window to evaluate |
--data-end-time | ISO 8601 end of the data window (defaults to now) |
--max-spans | Maximum number of spans to process (default: 10 000) |
--override-evaluations / --no-override-evaluations | Re-evaluate data that already has labels |
--experiment-ids | Comma-separated experiment global IDs (dataset-based tasks only) |
--wait / -w | Block until the run reaches a terminal state |
--poll-interval | Seconds between polling attempts when using --wait (default: 5) |
--timeout | Maximum seconds to wait when using --wait (default: 600) |
ax tasks list-runs
List runs for a task, with optional status filtering.
| Option | Description |
|---|---|
--status | Filter by run status: pending, running, completed, failed, cancelled |
--limit | Maximum number of results to return (default: 15) |
--cursor | Pagination cursor for the next page |
ax tasks get-run
Get a task run by its global ID.
ax tasks cancel-run
Cancel a task run. Only valid when the run is pending or running.
| Option | Description |
|---|---|
--force | Skip the confirmation prompt |
ax tasks wait-for-run
Poll a task run until it reaches a terminal state (completed, failed, or cancelled). Exits with an error if the run does not complete within the timeout.
| Option | Description |
|---|---|
--poll-interval | Seconds between polling attempts (default: 5) |
--timeout | Maximum seconds to wait before failing (default: 600) |