Skip to main content
This section covers migrating dataset management methods from v7’s ArizeDatasetsClient to v8’s ArizeClient.datasets.
from arize.experimental.datasets import ArizeDatasetsClient

# v7 api_key parameter took developer key values
client = ArizeDatasetsClient(
    api_key="your-developer-key"  # Developer key (deprecated)
)

list_datasets()

The list_datasets() method migrates from client.list_datasets() to client.datasets.list().

Parameter Reference

Parameterv7v8Changes
space_idRequiredOptionalNow optional; if not provided, lists datasets across all spaces
limitN/A✅ OptionalMaximum number of datasets to return (default 100)
cursorN/A✅ OptionalPagination cursor for retrieving next page

Side-by-Side Comparison

from arize.experimental.datasets import ArizeDatasetsClient

# Client initialization
client = ArizeDatasetsClient(api_key="your-developer-key")

# List datasets
datasets_df = client.list_datasets(space_id="your-space-id")

create_dataset()

The create_dataset() method migrates from client.create_dataset() to client.datasets.create().

Parameter Reference

Parameterv7v8Changes
space_idRequiredRequired
dataset_nameRequiredRequiredRenamed to name
nameN/A✅ RequiredRenamed from dataset_name
dataset_typeRequired❌ RemovedNo longer required
dataRequiredRequiredRenamed to examples
examplesN/A✅ RequiredRenamed from data; accepts DataFrame or list of dicts
convert_dict_to_jsonOptional❌ RemovedAutomatic conversion in v8
max_chunk_sizeOptional❌ RemovedNow configured at client level
force_httpN/A✅ OptionalForce HTTP upload instead of gRPC (default False)

Side-by-Side Comparison

from arize.experimental.datasets import ArizeDatasetsClient
from arize.pandas.proto import flight_pb2
import pandas as pd

# Client initialization
client = ArizeDatasetsClient(api_key="your-developer-key")

# Create dataset
dataset_id = client.create_dataset(
    space_id="your-space-id",
    dataset_name="my-dataset",
    dataset_type=flight_pb2.DatasetType.GENERATIVE,
    data=dataset_df,
    convert_dict_to_json=True,
    max_chunk_size=1000
)

get_dataset()

The get_dataset() method has a different behavior in v8. In v7, client.get_dataset() returned the dataset examples (underlying data). In v8, client.datasets.get() returns only the dataset metadata and versions, while client.datasets.list_examples() retrieves the actual examples.

Parameter Reference

For dataset metadata (v8’s datasets.get()):
Parameterv7v8Changes
space_idRequired❌ RemovedNot needed in v8
dataset_idOptionalRequiredNow required; no longer accepts dataset_name
dataset_nameOptional❌ RemovedUse dataset_id instead
dataset_versionOptional❌ RemovedAll versions are returned in metadata
convert_json_str_to_dictOptionalN/AOnly applies to examples, not metadata
For dataset examples (v8’s datasets.list_examples()):
Parameterv7v8Changes
dataset_idOptionalRequired
dataset_versionOptionalOptionalRenamed to dataset_version_id
dataset_version_idN/A✅ OptionalIf empty, returns latest version
limitN/A✅ OptionalMaximum number of examples per page (default 100); ignored if all=True
allN/A✅ OptionalWhen True, retrieves all examples via Flight (bypasses pagination). When False (default), uses REST with pagination
convert_json_str_to_dictOptional❌ RemovedAutomatic conversion in v8

Side-by-Side Comparison

from arize.experimental.datasets import ArizeDatasetsClient

# Client initialization
client = ArizeDatasetsClient(api_key="your-developer-key")

# Get dataset examples (underlying data) by ID
dataset_df = client.get_dataset(
    space_id="your-space-id",
    dataset_id="dataset-123",
    dataset_version="v1",
    convert_json_str_to_dict=True
)
# Returns: pandas DataFrame with the dataset examples

# Or get by name
dataset_df = client.get_dataset(
    space_id="your-space-id",
    dataset_name="my-dataset"
)

delete_dataset()

The delete_dataset() method migrates from client.delete_dataset() to client.datasets.delete().

Parameter Reference

Parameterv7v8Changes
space_idRequired❌ RemovedNot needed in v8
dataset_idOptionalRequiredNow required; no longer accepts dataset_name
dataset_nameOptional❌ RemovedUse dataset_id instead

Side-by-Side Comparison

from arize.experimental.datasets import ArizeDatasetsClient

# Client initialization
client = ArizeDatasetsClient(api_key="your-developer-key")

# Delete dataset by ID
success = client.delete_dataset(
    space_id="your-space-id",
    dataset_id="dataset-123"
)

# Or delete by name
success = client.delete_dataset(
    space_id="your-space-id",
    dataset_name="my-dataset"
)