> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Build a dataset

> Turn traces, reviewer feedback, or your own examples into a dataset you can reuse to compare prompt, model, and pipeline changes

## What is a dataset

In Arize, a dataset is the fixed set of examples you rerun in experiments to compare changes to your app over time. It gives you a stable benchmark, so you can tell whether a prompt, model, or pipeline update actually improved results or introduced regressions.

<Frame caption="Datasets page in Arize AX">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/dataset_hero.png" alt="Datasets page in Arize AX showing a list of datasets with names, row counts, created by, and timestamps" />
</Frame>

### What to include

A useful dataset blends **typical examples** that represent everyday traffic, **edge cases** the app has struggled with (ambiguous inputs, long contexts, unusual formats), and **known failures** pulled from traces, evaluator results, or reviewer feedback. Without typical examples, you optimize for edge cases and regress on the common path; without failures, you can't prove a fix actually holds.

### Dataset types

You'll also see datasets described by their source or purpose. These labels overlap and shift as a dataset matures:

* **Regression:** Examples where the app has already failed. Use these to verify a fix holds and doesn't quietly reintroduce the bug.
* **Golden:** Inputs with hand-labeled expected outputs — a stable benchmark for comparing prompt and model changes.
* **Synthetic:** Generated examples that mimic real inputs. Useful when production data is thin, sensitive, or missing the edge cases you want to stress-test.

<Tip>
  A regression set becomes part of a golden dataset once you label the expected output for each row. Collect failures first, label them as you go, and fold in typical traffic so the benchmark isn't just past bugs.
</Tip>

### Dataset row schema

Each row can include input messages, expected outputs, metadata, or any other columns your task function needs. Trace-sourced rows follow the [OpenInference](https://github.com/Arize-ai/openinference/blob/main/spec/semantic_conventions.md) convention (e.g., `attributes.input.value`). CSVs and inline examples use your own column names. Keep them consistent across sources.

### Common dataset row shapes

The labels above describe why a row belongs in the dataset. The row itself should match what your task function reads. Common patterns include:

**Key-value rows.** Use this when the task needs multiple fields such as an input, retrieved context, and an expected output.

| Input                            | Context                                                                         | Output                                                 |
| -------------------------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------ |
| `What is Paul Graham known for?` | `Paul Graham is an investor, entrepreneur, and computer scientist known for...` | `Paul Graham is known for co-founding Y Combinator...` |

**Prompt-completion pairs.** Use this for the simplest single-turn completion or classification cases.

| Input                                                 | Output   |
| ----------------------------------------------------- | -------- |
| `"do you have to have two license plates in ontario"` | `"True"` |

**Messages or chat rows.** Use this when your task expects multi-message inputs or outputs.

```json theme={null}
{
  "input": {
    "messages": [{"role": "system", "content": "You are an expert SQL assistant"}]
  },
  "output": {
    "messages": [{"role": "assistant", "content": "SELECT * FROM users;"}]
  }
}
```

Choose the shape that matches your task function and keep it consistent within a dataset version.

## Creating a dataset

Pick the tool you work in. Each tab covers the trace-based, file-upload, and synthetic paths where they apply.

<Tabs>
  <Tab title="By Arize Skills">
    The [Arize skills plugin](/ax/set-up-with-ai-assistants) wires dataset and trace workflows into your coding agent through the `ax` CLI.

    **From traces.** Combine [`arize-trace`](https://github.com/Arize-ai/arize-skills/tree/main/skills/arize-trace) with [`arize-dataset`](https://github.com/Arize-ai/arize-skills/tree/main/skills/arize-dataset). Try:

    * "Export error spans from the last 7 days in my `production-chatbot` project and create a dataset called `error-regression-v1`."
    * "Find spans where `annotation.hallucination.label = 'yes'` over the past 14 days and save them as `hallucination-examples`."

    **From a local file.** Point the `arize-dataset` skill at a CSV, JSON, JSONL, or Parquet file you already have. Try:

    * "Create a dataset called `billing-qa-v1` from `./data/billing_qa.csv` in my `support` space."
    * "Append the rows in `new_edge_cases.jsonl` to my existing `edge-cases` dataset."

    **Generate synthetic rows.** Have the agent draft examples for you. Try:

    * "Generate 50 synthetic billing support tickets with `query` and `expected_category` fields, then save as `support-synthetic-v1`."
    * "Draft 20 adversarial inputs targeting prompt injection for my chat agent and save as `adversarial-v1`."

    <Frame caption="Running Arize skills from your coding agent.">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/find_load_trace_skill.png" alt="Coding agent running Arize skills via the ax CLI to create datasets from traces and generated examples" />
    </Frame>
  </Tab>

  <Tab title="By Alyx">
    [Alyx](/ax/alyx) builds datasets directly from the app. It's available on the pages where you're already looking at traces, datasets, and prompts (see [Alyx meets you where you are](/ax/alyx#alyx-meets-you-where-you-are) for the full list of surfaces). Ask Alyx to turn specific spans into dataset rows, or have it draft synthetic examples when you don't have production data to pull from.

    **From traces.** Try:

    * *"Add this span and every similar error to a new regression dataset."*
    * *"Show me the most common failure patterns in the last 24 hours and add one example of each to a new dataset."*

    **Generate synthetic rows.** Try:

    * *"Create a synthetic dataset of 100 examples covering billing, technical, and general support categories."*
    * *"Generate 30 edge cases for my router prompt and save them as a new dataset."*

    <Frame caption="Ask Alyx to create a new dataset from the pages where you already work.">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/alyx_create_dataset.png" alt="Alyx sidebar in Arize AX responding to a request to create a new dataset" />
    </Frame>
  </Tab>

  <Tab title="By UI">
    **From the Traces table.** Filter by status, eval score, latency, annotations, or a natural-language query via AI Search. For example, `status_code = 'ERROR'` for exceptions, `eval.groundedness.score < 0.5` for low-scoring spans, or *"traces with hallucinations from yesterday"*.

    <Frame caption="Filter the Traces table to find the spans you want.">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/trace_filter.png" alt="Filter bar in the Arize AX Traces table with multiple span-query conditions applied" />
    </Frame>

    Select the spans you want and click **Add to Dataset** to create a new dataset or append to an existing one. Map at minimum the span's input and output (stored under `attributes.input.value` and `attributes.output.value`); for classification tasks, also include a column with the expected label.

    <Frame caption="Select spans in the Traces table and add them to a dataset.">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/ui_new_dataset_traces.png" alt="Selecting spans in the Arize AX Traces table and adding them to a new dataset with column mapping" />
    </Frame>

    **Upload a file.** Go to **Datasets & Experiments**, click **+ New Dataset**, and upload a CSV you've generated elsewhere.

    <Frame caption="Upload a CSV in the New Dataset dialog.">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/ui_new_dataset_csv.png" alt="New Dataset dialog in Arize AX with a CSV upload drop zone" />
    </Frame>
  </Tab>

  <Tab title="By Code">
    Use the Arize SDK to create datasets programmatically. For the Python examples, install `arize>=8.0.0` and set `ARIZE_API_KEY` and `ARIZE_SPACE_ID` in your environment.

    <CodeGroup>
      ```python Python theme={null}
      import os
      from datetime import datetime, timedelta
      from arize import ArizeClient

      client = ArizeClient(api_key=os.environ["ARIZE_API_KEY"])
      space = os.environ["ARIZE_SPACE_ID"]

      # From inline examples
      client.datasets.create(
          name="support-qa-v1",
          space=space,
          examples=[
              {"input": "How do I cancel?", "expected_category": "account"},
              {"input": "I was charged twice.", "expected_category": "billing"},
          ],
      )

      # From traces (columns are OpenInference span attributes)
      spans_df = client.spans.export_to_df(
          space_id=space,
          project_name="my-llm-app",
          start_time=datetime.now() - timedelta(days=30),
          end_time=datetime.now(),
      )
      client.datasets.create(name="traces-v1", space=space, examples=spans_df)
      ```

      ```typescript TS/JS theme={null}
      import { createDataset } from "@arizeai/ax-client";

      const dataset = await createDataset({
        space: "my-space",  // space name or ID
        name: "support-qa-v1",
        examples: [{ question: "What is 2+2?", answer: "4", topic: "math" }],
      });
      ```
    </CodeGroup>

    <Accordion title="Optional: include prompt template metadata in each row">
      Continuing from the Python example above, if each row needs to carry the prompt template and its filled variables, store them on the OpenInference prompt-template columns so Playground and code can map them consistently:

      ```python Python SDK v8 theme={null}
      import json
      import pandas as pd

      PROMPT_TEMPLATE = """
      You are an expert in the history of technological inventions.
      Identify the individual or organization that created the following invention.

      Invention: {invention}
      """

      prompt_rows = pd.DataFrame(
          [
              {
                  "attributes.llm.prompt_template.template": PROMPT_TEMPLATE,
                  "attributes.llm.prompt_template.variables": json.dumps(
                      {"invention": "Telephone"}
                  ),
                  "attributes.output.value": "Alexander Graham Bell",
              }
          ]
      )

      client.datasets.create(
          name="prompt-invention-dataset",
          space=space,
          examples=prompt_rows,
      )
      ```
    </Accordion>

    If you're migrating from Python SDK v7 dataset APIs, see the [datasets client migration guide](/api-clients/python/version-8/migration/datasets-client) for `create_dataset()` and `update_dataset()` replacements.
  </Tab>
</Tabs>

## Managing your dataset

Add, edit, export, or delete rows as the app evolves. Datasets are versioned, and appends land in the latest version in place.

<Tabs>
  <Tab title="By Arize Skills">
    Use the [`arize-dataset`](https://github.com/Arize-ai/arize-skills/tree/main/skills/arize-dataset) skill to append, export, or inspect datasets without leaving your editor. Try asking your agent:

    * "Append the rows in `new_examples.csv` to my `support-regression` dataset."
    * "Export the latest version of my `support-tickets` dataset so I can review it offline."
    * "Show me the schema and the first five rows of my `support-qa-v1` dataset."

    <Frame caption="Append new examples to an existing dataset from your coding agent with the arize-dataset skill">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/dataset_update_skill.png" alt="Coding agent running the arize-dataset skill via the ax CLI to append new examples to an existing dataset without leaving the editor" />
    </Frame>
  </Tab>

  <Tab title="By Alyx">
    The Dataset Page Agent can append, annotate, or summarize datasets. Try:

    * *"Add the last 20 error spans to my regression dataset."*
    * *"Label the rows in this dataset with their expected category."*
    * *"Summarize how my regression dataset rows break down by failure type."*

    <Frame caption="Dataset Page Agent appending spans, annotating rows, or summarizing a dataset in Alyx.">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/alyx_append_dataset.png" alt="Alyx Dataset Page Agent appending spans to an existing dataset in Arize AX" />
    </Frame>
  </Tab>

  <Tab title="By UI">
    Everything below happens on the **dataset detail view**, where you are working with a single dataset version at a time.

    To add a row, click **+ Example** and fill in the fields inline.

    <Frame caption="Add a row inline with + Example on the dataset detail view.">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/ui_add_example.png" alt="Dataset detail view in Arize AX with the + Example button for adding a new row inline" />
    </Frame>

    To edit, open any row in the table and change values in place. That is the natural place to add or correct expected outputs so regression rows become golden rows. To remove rows from the latest version, select them and click **Delete**.

    When you want a file for offline review or to share outside AX, click **Download as CSV** on the dataset page.

    <Frame caption="Download a dataset as CSV from the dataset page">
      <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/alyx_export_csv_dataset.png" alt="Download CSV from the dataset page in Arize AX" />
    </Frame>
  </Tab>

  <Tab title="By Code">
    Use the Arize SDK to append or export dataset rows.

    <CodeGroup>
      ```python Python theme={null}
      import os
      from arize import ArizeClient

      client = ArizeClient(api_key=os.environ["ARIZE_API_KEY"])

      # Append rows — lands in the latest version in place.
      # Pass `dataset_version_id` to target a specific version.
      client.datasets.append_examples(
          dataset="YOUR_DATASET_ID",
          examples=[{"input": "...", "expected_category": "billing"}],
      )

      # Export examples for offline analysis. `all=True` fetches every row.
      examples_df = client.datasets.list_examples(
          dataset="YOUR_DATASET_ID",
          all=True,
      ).to_df()
      ```

      ```typescript TS/JS theme={null}
      import { appendExamples, listDatasetExamples } from "@arizeai/ax-client";

      // Append new examples to an existing dataset
      await appendExamples({
        dataset: "your_dataset_id",
        examples: [{ question: "What is 3+3?", answer: "6", topic: "math" }],
      });

      // List examples for offline analysis
      const examples = await listDatasetExamples({
        dataset: "your_dataset_id",
      });
      ```
    </CodeGroup>

    For the full reference — including `get`, `delete`, DataFrame input, and pagination — see the [Python datasets client](/api-clients/python/version-8/client-resources/datasets) and [TypeScript datasets client](/api-clients/typescript/version-1/client-resources/datasets).
  </Tab>
</Tabs>

## Auto add to dataset

Once the dataset exists, set up rules that automatically add spans when they match your criteria. Auto-add rules keep the dataset current with what's actually happening in production, without manual curation.

### From evaluator labels

After you've set up an evaluator on a project, add a post-processing step that routes spans to a dataset based on the evaluator's result. See [Create evaluators](/ax/evaluate/create-evaluators) for evaluator setup, then edit the evaluator configuration for your task.

<Frame caption="Select the evaluator from the task configuration">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/auto_add_one.png" alt="Task configuration page in Arize AX showing the evaluator selection dropdown" />
</Frame>

Select **Auto Add Spans to Dataset**, then specify which eval labels should trigger the addition. For example, all spans where *Correctness* is *Incorrect*, or any span where the eval label is not null.

<Frame caption="Configure auto-add rules from evaluator results">
  <img src="https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/improve/auto_add_two.png" alt="Evaluator configuration panel in Arize AX with the 'Auto Add Spans to Dataset' option selected and filter criteria entered" />
</Frame>

### From filter criteria

You can also auto-add spans that match basic filter criteria without an evaluator, such as high token counts, latency above a threshold, or a specific tool call. Use this when the signal is structural rather than labeled.

## Next step

Your dataset is in place. Now measure whether prompt, model, or pipeline changes actually improve your AI.

<Card title="Set up an experiment" icon="arrow-right" href="/ax/improve/set-up-an-experiment">
  Define your baseline, decide what to change, and choose Playground or code.
</Card>

## Further reading

* [View and manage traces](/ax/observe/tracing/view-and-manage-traces): find spans worth turning into regression cases.
* [Human review](/ax/evaluate/human-review): turn reviewer feedback into dataset rows.
* [Labeling queues](/ax/evaluate/labeling-queues): collect labels at scale before you build or update a golden dataset.
