> ## Documentation Index > Fetch the complete documentation index at: https://arize-ax.mintlify.site/docs/llms.txt > Use this file to discover all available pages before exploring further. # Build a dataset > Turn traces, reviewer feedback, or your own examples into a dataset you can reuse to compare prompt, model, and pipeline changes ## What is a dataset In Arize, a dataset is the fixed set of examples you rerun in experiments to compare changes to your app over time. It gives you a stable benchmark, so you can tell whether a prompt, model, or pipeline update actually improved results or introduced regressions. **Let Alyx build the dataset for you.** Press **Cmd+L** (macOS) or **Ctrl+L** (Windows/Linux) to open [Alyx](/docs/ax/alyx) and try: *"Create a dataset from the spans with errors"* or *"Create a synthetic dataset with 100 examples for regression testing"* Datasets page in Arize AX showing a list of datasets with names, row counts, created by, and timestamps

Datasets page in Arize AX showing a list of datasets with names, row counts, created by, and timestamps

### What to include A useful dataset blends **typical examples** that represent everyday traffic, **edge cases** the app has struggled with (ambiguous inputs, long contexts, unusual formats), and **known failures** pulled from traces, evaluator results, or reviewer feedback. Without typical examples, you optimize for edge cases and regress on the common path; without failures, you can't prove a fix actually holds. ### Dataset types You'll also see datasets described by their source or purpose. These labels overlap and shift as a dataset matures: * **Regression:** Examples where the app has already failed. Use these to verify a fix holds and doesn't quietly reintroduce the bug. * **Golden:** Inputs with hand-labeled expected outputs — a stable benchmark for comparing prompt and model changes. * **Synthetic:** Generated examples that mimic real inputs. Useful when production data is thin, sensitive, or missing the edge cases you want to stress-test. A regression set becomes part of a golden dataset once you label the expected output for each row. Collect failures first, label them as you go, and fold in typical traffic so the benchmark isn't just past bugs. ### Dataset row schema Each row can include input messages, expected outputs, metadata, or any other columns your task function needs. Trace-sourced rows follow the [OpenInference](https://github.com/Arize-ai/openinference/blob/main/spec/semantic_conventions.md) convention (e.g., `attributes.input.value`). CSVs and inline examples use your own column names. Keep them consistent across sources. ### Common dataset row shapes The labels above describe why a row belongs in the dataset. The row itself should match what your task function reads. Common patterns include: **Key-value rows.** Use this when the task needs multiple fields such as an input, retrieved context, and an expected output. | Input | Context | Output | | -------------------------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------ | | `What is Paul Graham known for?` | `Paul Graham is an investor, entrepreneur, and computer scientist known for...` | `Paul Graham is known for co-founding Y Combinator...` | **Prompt-completion pairs.** Use this for the simplest single-turn completion or classification cases. | Input | Output | | ----------------------------------------------------- | -------- | | `"do you have to have two license plates in ontario"` | `"True"` | **Messages or chat rows.** Use this when your task expects multi-message inputs or outputs. ```json theme={null} { "input": { "messages": [{"role": "system", "content": "You are an expert SQL assistant"}] }, "output": { "messages": [{"role": "assistant", "content": "SELECT * FROM users;"}] } } ``` Choose the shape that matches your task function and keep it consistent within a dataset version. ## Creating a dataset Pick the tool you work in. Each tab covers the trace-based, file-upload, and synthetic paths where they apply. The [Arize skills plugin](/docs/ax/set-up-with-ai-assistants) wires dataset and trace workflows into your coding agent through the `ax` CLI. **From traces.** Combine [`arize-trace`](https://github.com/Arize-ai/arize-skills/tree/main/skills/arize-trace) with [`arize-dataset`](https://github.com/Arize-ai/arize-skills/tree/main/skills/arize-dataset). Try: * "Export error spans from the last 7 days in my `production-chatbot` project and create a dataset called `error-regression-v1`." * "Find spans where `annotation.hallucination.label = 'yes'` over the past 14 days and save them as `hallucination-examples`." **From a local file.** Point the `arize-dataset` skill at a CSV, JSON, JSONL, or Parquet file you already have. Try: * "Create a dataset called `billing-qa-v1` from `./data/billing_qa.csv` in my `support` space." * "Append the rows in `new_edge_cases.jsonl` to my existing `edge-cases` dataset." **Generate synthetic rows.** Have the agent draft examples for you. Try: * "Generate 50 synthetic billing support tickets with `query` and `expected_category` fields, then save as `support-synthetic-v1`." * "Draft 20 adversarial inputs targeting prompt injection for my chat agent and save as `adversarial-v1`." Coding agent running Arize skills via the ax CLI to create datasets from traces and generated examples

Coding agent running Arize skills via the ax CLI to create datasets from traces and generated examples

[Alyx](/docs/ax/alyx) builds datasets directly from the app. It's available on the pages where you're already looking at traces, datasets, and prompts (see [Alyx meets you where you are](/docs/ax/alyx#alyx-meets-you-where-you-are) for the full list of surfaces). Ask Alyx to turn specific spans into dataset rows, or have it draft synthetic examples when you don't have production data to pull from. **From traces.** Try: * *"Add this span and every similar error to a new regression dataset."* * *"Show me the most common failure patterns in the last 24 hours and add one example of each to a new dataset."* **Generate synthetic rows.** Try: * *"Create a synthetic dataset of 100 examples covering billing, technical, and general support categories."* * *"Generate 30 edge cases for my router prompt and save them as a new dataset."* Alyx sidebar in Arize AX responding to a request to create a new dataset

Alyx sidebar in Arize AX responding to a request to create a new dataset

**From the Traces table.** Filter by status, eval score, latency, annotations, or a natural-language query via AI Search. For example, `status_code = 'ERROR'` for exceptions, `eval.groundedness.score < 0.5` for low-scoring spans, or *"traces with hallucinations from yesterday"*. Filter bar in the Arize AX Traces table with multiple span-query conditions applied

Filter bar in the Arize AX Traces table with multiple span-query conditions applied

Select the spans you want and click **Add to Dataset** to create a new dataset or append to an existing one. Map at minimum the span's input and output (stored under `attributes.input.value` and `attributes.output.value`); for classification tasks, also include a column with the expected label. Selecting spans in the Arize AX Traces table and adding them to a new dataset with column mapping

Selecting spans in the Arize AX Traces table and adding them to a new dataset with column mapping

**Upload a file.** Go to **Datasets & Experiments**, click **+ New Dataset**, and upload a CSV you've generated elsewhere. New Dataset dialog in Arize AX with a CSV upload drop zone

New Dataset dialog in Arize AX with a CSV upload drop zone

Use the Arize SDK to create datasets programmatically. For the Python examples, install `arize>=8.0.0` and set `ARIZE_API_KEY` and `ARIZE_SPACE_ID` in your environment. ```python Python theme={null} import os from datetime import datetime, timedelta from arize import ArizeClient client = ArizeClient(api_key=os.environ["ARIZE_API_KEY"]) space = os.environ["ARIZE_SPACE_ID"] # From inline examples client.datasets.create( name="support-qa-v1", space=space, examples=[ {"input": "How do I cancel?", "expected_category": "account"}, {"input": "I was charged twice.", "expected_category": "billing"}, ], ) # From traces (columns are OpenInference span attributes) spans_df = client.spans.export_to_df( space_id=space, project_name="my-llm-app", start_time=datetime.now() - timedelta(days=30), end_time=datetime.now(), ) client.datasets.create(name="traces-v1", space=space, examples=spans_df) ``` ```typescript TS/JS theme={null} import { createDataset } from "@arizeai/ax-client"; const dataset = await createDataset({ space: "my-space", // space name or ID name: "support-qa-v1", examples: [{ question: "What is 2+2?", answer: "4", topic: "math" }], }); ``` Continuing from the Python example above, if each row needs to carry the prompt template and its filled variables, store them on the OpenInference prompt-template columns so Playground and code can map them consistently: ```python Python SDK v8 theme={null} import json import pandas as pd PROMPT_TEMPLATE = """ You are an expert in the history of technological inventions. Identify the individual or organization that created the following invention. Invention: {invention} """ prompt_rows = pd.DataFrame( [ { "attributes.llm.prompt_template.template": PROMPT_TEMPLATE, "attributes.llm.prompt_template.variables": json.dumps( {"invention": "Telephone"} ), "attributes.output.value": "Alexander Graham Bell", } ] ) client.datasets.create( name="prompt-invention-dataset", space=space, examples=prompt_rows, ) ``` If you're migrating from Python SDK v7 dataset APIs, see the [datasets client migration guide](/docs/api-clients/python/version-8/migration/datasets-client) for `create_dataset()` and `update_dataset()` replacements. ## Managing your dataset Add, edit, export, or delete rows as the app evolves. Datasets are versioned, and appends land in the latest version in place. Use the [`arize-dataset`](https://github.com/Arize-ai/arize-skills/tree/main/skills/arize-dataset) skill to append, export, or inspect datasets without leaving your editor. Try asking your agent: * "Append the rows in `new_examples.csv` to my `support-regression` dataset." * "Export the latest version of my `support-tickets` dataset so I can review it offline." * "Show me the schema and the first five rows of my `support-qa-v1` dataset." Coding agent running the arize-dataset skill via the ax CLI to append new examples to an existing dataset without leaving the editor

Coding agent running the arize-dataset skill via the ax CLI to append new examples to an existing dataset without leaving the editor

The Dataset Page Agent can append, annotate, or summarize datasets. Try: * *"Add the last 20 error spans to my regression dataset."* * *"Label the rows in this dataset with their expected category."* * *"Summarize how my regression dataset rows break down by failure type."* Alyx Dataset Page Agent appending spans to an existing dataset in Arize AX

Alyx Dataset Page Agent appending spans to an existing dataset in Arize AX

Everything below happens on the **dataset detail view**, where you are working with a single dataset version at a time. To add a row, click **+ Example** and fill in the fields inline. Dataset detail view in Arize AX with the + Example button for adding a new row inline

Dataset detail view in Arize AX with the + Example button for adding a new row inline

To edit, open any row in the table and change values in place. That is the natural place to add or correct expected outputs so regression rows become golden rows. To remove rows from the latest version, select them and click **Delete**. When you want a file for offline review or to share outside AX, click **Download as CSV** on the dataset page. Download CSV from the dataset page in Arize AX

Download CSV from the dataset page in Arize AX

Use the Arize SDK to append or export dataset rows. ```python Python theme={null} import os from arize import ArizeClient client = ArizeClient(api_key=os.environ["ARIZE_API_KEY"]) # Append rows — lands in the latest version in place. # Pass `dataset_version_id` to target a specific version. client.datasets.append_examples( dataset="YOUR_DATASET_ID", examples=[{"input": "...", "expected_category": "billing"}], ) # Export examples for offline analysis. `all=True` fetches every row. examples_df = client.datasets.list_examples( dataset="YOUR_DATASET_ID", all=True, ).to_df() ``` ```typescript TS/JS theme={null} import { appendExamples, listDatasetExamples } from "@arizeai/ax-client"; // Append new examples to an existing dataset await appendExamples({ dataset: "your_dataset_id", examples: [{ question: "What is 3+3?", answer: "6", topic: "math" }], }); // List examples for offline analysis const examples = await listDatasetExamples({ dataset: "your_dataset_id", }); ``` For the full reference — including `get`, `delete`, DataFrame input, and pagination — see the [Python datasets client](/docs/api-clients/python/version-8/client-resources/datasets) and [TypeScript datasets client](/docs/api-clients/typescript/version-1/client-resources/datasets). ## Auto add to dataset Once the dataset exists, set up rules that automatically add spans when they match your criteria. Auto-add rules keep the dataset current with what's actually happening in production, without manual curation. ### From evaluator labels After you've set up an evaluator on a project, add a post-processing step that routes spans to a dataset based on the evaluator's result. See [Create evaluators](/docs/ax/evaluate/create-evaluators) for evaluator setup, then edit the evaluator configuration for your task. Task configuration page in Arize AX showing the evaluator selection dropdown

Task configuration page in Arize AX showing the evaluator selection dropdown

Select **Auto Add Spans to Dataset**, then specify which eval labels should trigger the addition. For example, all spans where *Correctness* is *Incorrect*, or any span where the eval label is not null. Evaluator configuration panel in Arize AX with the 'Auto Add Spans to Dataset' option selected and filter criteria entered

Evaluator configuration panel in Arize AX with the 'Auto Add Spans to Dataset' option selected and filter criteria entered

### From filter criteria You can also auto-add spans that match basic filter criteria without an evaluator, such as high token counts, latency above a threshold, or a specific tool call. Use this when the signal is structural rather than labeled. ## Next step Your dataset is in place. Now measure whether prompt, model, or pipeline changes actually improve your AI. Define your baseline, decide what to change, and choose Playground or code. ## Further reading * [View and manage traces](/docs/ax/observe/tracing/view-and-manage-traces): find spans worth turning into regression cases. * [Human review](/docs/ax/evaluate/human-review): turn reviewer feedback into dataset rows. * [Labeling queues](/docs/ax/evaluate/labeling-queues): collect labels at scale before you build or update a golden dataset.