> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasets

> Create versioned datasets for experimentation, evaluation, and fine-tuning. Supports Python dicts and pandas DataFrames.

<Note>
  The `datasets` client methods are currently in **BETA**. The API may change without notice. A one-time warning is emitted on first use.
</Note>

Create versioned datasets for experimentation, evaluation, and fine-tuning. Datasets are version-controlled collections of examples. Updates modify the current version in-place.

## Key Capabilities

* Create datasets from Python dicts or pandas DataFrames
* Append examples in-place to existing dataset versions
* Efficient bulk operations via Arrow Flight for large datasets
* Cache datasets locally for faster experiment iteration

## List Datasets

List all datasets with optional filtering by space or name.

```python theme={null}
resp = client.datasets.list(
    space="your-space-name-or-id",  # optional
    name="my-dataset",              # optional substring filter
    limit=50,
)

for dataset in resp.datasets:
    print(dataset.id, dataset.name)
```

For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see [Response Objects](/api-clients/python/version-8/overview#response-objects).

## Create a Dataset

Create a new dataset with examples for evaluation or experimentation.

<CodeGroup>
  ```python From Dictionaries theme={null}
  examples = [
      {
          "query": "What is the capital of France?",
          "expected_output": "Paris",
          "eval.Correctness.label": "correct",
      },
      {
          "query": "Who wrote Romeo and Juliet?",
          "expected_output": "William Shakespeare",
          "eval.Correctness.label": "correct",
      },
  ]

  dataset = client.datasets.create(
      space="your-space-name-or-id",
      name="my-test-dataset",
      examples=examples,
  )
  ```

  ```python From DataFrame theme={null}
  import pandas as pd

  examples = pd.DataFrame({
      "query": ["What is AI?", "Explain ML"],
      "expected_output": ["Artificial Intelligence...", "Machine Learning..."],
  })

  dataset = client.datasets.create(
      space="your-space-name-or-id",
      name="my-dataset-from-df",
      examples=examples,
  )
  ```
</CodeGroup>

## Get a Dataset

Retrieve a specific dataset by name or ID. When using a name, provide `space` to disambiguate.

```python theme={null}
dataset = client.datasets.get(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
)

print(dataset)
```

## Delete a Dataset

Delete a dataset by name or ID. This operation is irreversible. There is no response from this call.

```python theme={null}
client.datasets.delete(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
)

print("Dataset deleted successfully")
```

## Rename a Dataset

<Note>
  The `update` method is currently in **ALPHA**. The API may change without notice.
</Note>

Rename a dataset. The new name must be unique within the space.

```python theme={null}
dataset = client.datasets.update(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    name="renamed-dataset",
)

print(dataset.name)
```

## List Dataset Examples

Retrieve examples from a dataset with pagination support. Pass `all=True` to fetch all examples via Flight (ignores `limit`).

```python theme={null}
resp = client.datasets.list_examples(
    dataset="dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    limit=100,
)

for example in resp.examples:
    print(example)
```

For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see [Response Objects](/api-clients/python/version-8/overview#response-objects).

## Append Dataset Examples

Add new examples to an existing dataset. Examples are appended in-place to the latest dataset version by default — this does not create a new version. You can target a specific version by passing `dataset_version_id`.

<CodeGroup>
  ```python From Dictionaries theme={null}
  new_examples = [
      {
          "query": "What is machine learning?",
          "expected_output": "A subset of AI focused on learning from data",
          "eval.Correctness.label": "correct",
      },
      {
          "query": "Who invented Python?",
          "expected_output": "Guido van Rossum",
          "eval.Correctness.label": "correct",
      },
  ]

  updated_dataset = client.datasets.append_examples(
      dataset="dataset-name-or-id",
      space="your-space-name-or-id",  # required when using a name
      examples=new_examples,
  )
  ```

  ```python From DataFrame theme={null}
  import pandas as pd

  new_examples_df = pd.DataFrame({
      "query": ["Explain neural networks", "What is NLP?"],
      "expected_output": ["Networks inspired by biological neurons...", "Natural Language Processing..."],
  })

  updated_dataset = client.datasets.append_examples(
      dataset="dataset-name-or-id",
      space="your-space-name-or-id",  # required when using a name
      examples=new_examples_df,
  )
  ```
</CodeGroup>

**Note:** Do not include system-managed fields (`id`, `created_at`, `updated_at`) in your examples. These are automatically generated by the server.

## Annotate Dataset Examples

<Note>
  The `annotate_examples` method is currently in **ALPHA**. The API may change without notice.
</Note>

Write human annotations to a batch of examples in a dataset. Annotations are upserted by annotation config name for each example; submitting the same name for the same example overwrites the previous value. Up to 1000 examples may be annotated per request.

```python theme={null}
from arize.datasets.types import AnnotateRecordInput, AnnotationInput

result = client.datasets.annotate_examples(
    dataset="your-dataset-name-or-id",
    space="your-space-name-or-id",  # required when using a name
    annotations=[
        AnnotateRecordInput(
            record_id="your-example-id",
            values=[
                AnnotationInput(name="quality", score=0.9),
                AnnotationInput(name="topic", label="science"),
            ],
        ),
    ],
)

print(result)
```

**Learn more:** [Datasets Documentation](https://arize.com/docs/ax/develop/datasets)
