> ## Documentation Index > Fetch the complete documentation index at: https://arize-ax.mintlify.site/docs/llms.txt > Use this file to discover all available pages before exploring further. # Tasks > Create and manage evaluation tasks that automatically score your LLM application on a schedule or on demand. The `tasks` client methods are currently in **ALPHA**. The API may change without notice. A one-time warning is emitted on first use. Create evaluation tasks that continuously or on-demand score spans in a project, or evaluate examples in a dataset using your LLM-as-judge evaluators. ## Key Capabilities * Create project-based tasks that run continuously against live spans * Create dataset-based tasks that evaluate experiment results * Create `run_experiment` tasks that drive LLM calls on the server * Trigger on-demand task runs with custom data windows * Poll task runs until completion with configurable timeout * Cancel in-progress runs * List and filter task runs by status ## List Tasks List tasks you have access to, with optional filtering by space, project, dataset, or type. ```python theme={null} resp = client.tasks.list( space="your-space-name-or-id", # optional limit=50, ) for task in resp.tasks: print(task.id, task.name) ``` Filter by task type: ```python theme={null} resp = client.tasks.list( space="your-space-name-or-id", task_type="template_evaluation", ) ``` Valid values for `task_type` are `"template_evaluation"`, `"code_evaluation"`, and `"run_experiment"`. For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see [Response Objects](/api-clients/python/version-8/overview#response-objects). ## Create an Evaluation Task Create a new evaluation task. Evaluation tasks can target either a **project** (live spans) or a **dataset** (experiment results). ### Project-Based Task A project-based task continuously evaluates incoming spans. Set `is_continuous=True` to run the task on every new span, or `False` to run it only on demand. ```python theme={null} from arize.tasks.types import BaseEvaluationTaskRequestEvaluatorsInner task = client.tasks.create_evaluation_task( name="Relevance Monitor", task_type="template_evaluation", project="your-project-name-or-id", evaluators=[ BaseEvaluationTaskRequestEvaluatorsInner( evaluator_id="your-evaluator-id", ), ], is_continuous=True, sampling_rate=0.1, # Evaluate 10% of spans ) print(task.id) ``` ### Dataset-Based Task A dataset-based task evaluates examples from one or more experiments. At least one `experiment_ids` entry is required. ```python theme={null} task = client.tasks.create_evaluation_task( name="Experiment Evaluation", task_type="template_evaluation", dataset="your-dataset-name-or-id", experiment_ids=["experiment-id-1", "experiment-id-2"], evaluators=[ BaseEvaluationTaskRequestEvaluatorsInner( evaluator_id="your-evaluator-id", ), ], is_continuous=False, ) print(task.id) ``` ### Column Mappings and Filters Each evaluator in the task can have its own column mappings (to map template variables to span attribute names) and a per-evaluator query filter. ```python theme={null} task = client.tasks.create_evaluation_task( name="Custom Relevance", task_type="template_evaluation", project="your-project-name-or-id", evaluators=[ BaseEvaluationTaskRequestEvaluatorsInner( evaluator_id="your-evaluator-id", column_mappings={"user_query": "input.value"}, query_filter="status_code = 'OK'", ), ], query_filter="latency_ms < 5000", # Task-level filter (AND-ed with evaluator filter) is_continuous=True, ) ``` **Parameter reference:** | Parameter | Type | Description | | ---------------- | ----------- | ---------------------------------------------------------------------------------------- | | `name` | `str` | Task name. Must be unique within the space. | | `task_type` | `str` | `"template_evaluation"` or `"code_evaluation"`. | | `evaluators` | `list` | List of evaluators to attach. At least one required. | | `project` | `str` | Target project name or ID. Required when `dataset` is not provided. | | `dataset` | `str` | Target dataset name or ID. Required when `project` is not provided. | | `space` | `str` | Space name or ID used to disambiguate name-based resolution for `project` and `dataset`. | | `experiment_ids` | `list[str]` | Required (at least one) when `dataset` is provided. | | `sampling_rate` | `float` | Fraction of spans to evaluate (0–1). Project-based tasks only. | | `is_continuous` | `bool` | `True` to run on every new span; `False` for on-demand only. | | `query_filter` | `str` | Task-level SQL-style filter applied to all evaluators. | ## Create a Run-Experiment Task A `run_experiment` task drives all LLM calls on the server using the AI integration specified in `run_configuration` — no local callable is required. ```python theme={null} from arize.tasks.types import LlmGenerationRunConfig task = client.tasks.create_run_experiment_task( name="Nightly QA Run", dataset="your-dataset-name-or-id", space="your-space-name-or-id", # required when dataset is a name run_configuration=LlmGenerationRunConfig( # provider/model/prompt configuration for the server-driven run # ... ), ) print(task.id) ``` The method also accepts a `TemplateEvaluationRunConfig` instance or a plain `dict` matching one of those schemas; the SDK wraps it for you. ## Get a Task Retrieve a task by name or ID. When using a name, provide `space` to disambiguate. ```python theme={null} task = client.tasks.get( task="your-task-name-or-id", space="your-space-name-or-id", # required when using a name ) print(task.id, task.name) ``` ## Update a Task Update mutable fields on an existing task. At least one update field must be provided. Pass `query_filter=None` to clear the existing filter; omit any other argument to leave it unchanged. ```python theme={null} task = client.tasks.update( task="your-task-name-or-id", space="your-space-name-or-id", # required when using a name name="Relevance Monitor v2", sampling_rate=0.25, # project-based tasks only ) print(task.id, task.name) ``` ## Delete a Task Delete a task and its associated configuration. This operation is irreversible. ```python theme={null} client.tasks.delete( task="your-task-name-or-id", space="your-space-name-or-id", # required when using a name ) print("Task deleted successfully") ``` ## Task Runs ### Trigger a Run Trigger an on-demand run for a task. The run starts in `"pending"` status. The accepted parameters depend on the task's type. **Evaluation tasks** (`template_evaluation` / `code_evaluation`): ```python theme={null} from datetime import datetime run = client.tasks.trigger_run( task="your-task-name-or-id", data_start_time=datetime(2024, 1, 1), data_end_time=datetime(2024, 2, 1), ) print(run.id, run.status) # e.g. "run-abc123", "pending" ``` | Parameter | Type | Default | Description | | ---------------------- | ----------- | -------- | ------------------------------------------------------------------------------------------ | | `task` | `str` | required | Task name or ID to trigger. | | `space` | `str` | None | Space name or ID used to disambiguate the task lookup. Recommended when resolving by name. | | `data_start_time` | `datetime` | None | Start of data window to evaluate. | | `data_end_time` | `datetime` | now | End of data window. Defaults to the current time. | | `max_spans` | `int` | 10 000 | Maximum number of spans to process. | | `override_evaluations` | `bool` | `False` | Re-evaluate data that already has labels. | | `experiment_ids` | `list[str]` | None | Experiment IDs to run against (dataset-based tasks only). | **`run_experiment` tasks**: ```python theme={null} run = client.tasks.trigger_run( task="your-run-experiment-task", experiment_name="qa-run-2024-01-15", # required: display name for the experiment max_examples=100, # optional cap ) ``` | Parameter | Type | Default | Description | | --------------------- | ---------------- | -------- | -------------------------------------------------------------------------------------------------------------- | | `task` | `str` | required | Task name or ID to trigger. | | `space` | `str` | None | Space name or ID used to disambiguate the task lookup. | | `experiment_name` | `str` | required | Display name for the experiment to be created. Must be unique within the dataset. | | `dataset_version_id` | `str` | latest | Dataset version global ID. Defaults to the latest version. | | `max_examples` | `int` | None | Maximum number of examples to run. When omitted, all examples are used. Mutually exclusive with `example_ids`. | | `example_ids` | `list[str]` | None | Specific dataset example global IDs to run against. Mutually exclusive with `max_examples`. | | `tracing_metadata` | `dict[str, Any]` | None | Arbitrary key-value metadata attached to the run's traces. | | `evaluation_task_ids` | `list[str]` | None | Task global IDs of evaluation tasks to trigger after the experiment run completes. | ### List Runs List runs for a task with optional status filtering. ```python theme={null} resp = client.tasks.list_runs( task="your-task-name-or-id", limit=20, ) for run in resp.task_runs: print(run.id, run.status) ``` Filter to only completed runs: ```python theme={null} resp = client.tasks.list_runs( task="your-task-name-or-id", status="completed", ) ``` Valid `status` values: `"pending"`, `"running"`, `"completed"`, `"failed"`, `"cancelled"`. ### Get a Run Retrieve a specific run by its ID. ```python theme={null} run = client.tasks.get_run(run_id="your-run-id") print(run.id, run.status) ``` ### Cancel a Run Cancel a run that is currently `"pending"` or `"running"`. ```python theme={null} run = client.tasks.cancel_run(run_id="your-run-id") print(run.status) # "cancelled" ``` ### Wait for a Run Poll a run until it reaches a terminal state (`"completed"`, `"failed"`, or `"cancelled"`). ```python theme={null} run = client.tasks.wait_for_run( run_id="your-run-id", poll_interval=5, # Check every 5 seconds (default) timeout=600, # Give up after 10 minutes (default) ) print(run.status) # "completed", "failed", or "cancelled" ``` **Raises `TimeoutError`** if the run does not complete within `timeout` seconds. ### End-to-End: Trigger and Wait ```python theme={null} # Trigger an on-demand run run = client.tasks.trigger_run(task="your-task-name-or-id") # Block until the run finishes run = client.tasks.wait_for_run(run_id=run.id) if run.status == "completed": print("Task run completed successfully") elif run.status == "failed": print("Task run failed") ``` **Learn more:** [Online Evaluations Documentation](https://arize.com/docs/ax/evaluate/online-evals)