Skip to main content
Create versioned datasets for experimentation, evaluation, and fine-tuning. Datasets are immutable and version-controlled automatically.

Key Capabilities

  • Create datasets from Python dicts or pandas DataFrames
  • Automatic versioning on updates
  • Efficient bulk operations via Arrow Flight for large datasets
  • Cache datasets locally for faster experiment iteration

List Datasets

List all datasets in a space with optional pagination.
resp = client.datasets.list(
    space_id="your-space-id",
    limit=50,
)

for dataset in resp.datasets:
    print(dataset.id, dataset.name)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Create a Dataset

Create a new dataset with examples for evaluation or experimentation.
examples = [
    {
        "query": "What is the capital of France?",
        "expected_output": "Paris",
        "eval.Correctness.label": "correct",
    },
    {
        "query": "Who wrote Romeo and Juliet?",
        "expected_output": "William Shakespeare",
        "eval.Correctness.label": "correct",
    },
]

dataset = client.datasets.create(
    space_id="your-space-id",
    name="my-test-dataset",
    examples=examples,
)

Get a Dataset

Retrieve a specific dataset by its ID.
dataset = client.datasets.get(dataset_id="dataset-id")

print(dataset)

Delete a Dataset

Delete a dataset by ID. This operation is irreversible. There is no response from this call.
client.datasets.delete(dataset_id="dataset-id")

print("Dataset deleted successfully")

List Dataset Examples

Retrieve examples from a dataset with pagination support.
resp = client.datasets.list_examples(
    dataset_id="dataset-id",
    limit=100,
)

for example in resp.examples:
    print(example)
For details on pagination, field introspection, and data conversion (to dict/JSON/DataFrame), see Response Objects.

Append Dataset Examples

Add new examples to an existing dataset. Examples are appended to the latest dataset version by default.
new_examples = [
    {
        "query": "What is machine learning?",
        "expected_output": "A subset of AI focused on learning from data",
        "eval.Correctness.label": "correct",
    },
    {
        "query": "Who invented Python?",
        "expected_output": "Guido van Rossum",
        "eval.Correctness.label": "correct",
    },
]

updated_dataset = client.datasets.append_examples(
    dataset_id="dataset-id",
    examples=new_examples,
)
Note: Do not include system-managed fields (id, created_at, updated_at) in your examples. These are automatically generated by the server. Learn more: Datasets Documentation