> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Migrate Datasets Client

> Migrate dataset management from ArizeDatasetsClient to ArizeClient.datasets in the v8 SDK.

This section covers migrating dataset management methods from v7's `ArizeDatasetsClient` to v8's `ArizeClient.datasets`.

<CodeGroup>
  ```python Version 7 theme={null}
  from arize.experimental.datasets import ArizeDatasetsClient

  # v7 api_key parameter took developer key values
  client = ArizeDatasetsClient(
      api_key="your-developer-key"  # Developer key (deprecated)
  )
  ```

  ```python Version 8 theme={null}
  from arize import ArizeClient

  # v8 api_key parameter takes standard API keys
  client = ArizeClient(
      api_key="your-api-key"  # Standard API key
  )
  ```
</CodeGroup>

## list\_datasets()

The `list_datasets()` method migrates from `client.list_datasets()` to `client.datasets.list()`.

### Parameter Reference

| Parameter  | v7       | v8         | Changes                                                         |
| ---------- | -------- | ---------- | --------------------------------------------------------------- |
| `space_id` | Required | Optional   | Now optional; if not provided, lists datasets across all spaces |
| `limit`    | N/A      | ✅ Optional | Maximum number of datasets to return (default 100)              |
| `cursor`   | N/A      | ✅ Optional | Pagination cursor for retrieving next page                      |

### Side-by-Side Comparison

<CodeGroup>
  ```python Version 7 theme={null}
  from arize.experimental.datasets import ArizeDatasetsClient

  # Client initialization
  client = ArizeDatasetsClient(api_key="your-developer-key")

  # List datasets
  datasets_df = client.list_datasets(space_id="your-space-id")
  ```

  ```python Version 8 theme={null}
  from arize import ArizeClient

  # Client initialization
  client = ArizeClient(api_key="your-api-key")

  # List datasets
  response = client.datasets.list(
      space_id="your-space-id",  # Optional in v8
      limit=100,
      cursor=None
  )
  datasets = response.to_df()
  ```
</CodeGroup>

## create\_dataset()

The `create_dataset()` method migrates from `client.create_dataset()` to `client.datasets.create()`.

### Parameter Reference

| Parameter              | v7       | v8         | Changes                                                                                 |
| ---------------------- | -------- | ---------- | --------------------------------------------------------------------------------------- |
| `space_id`             | Required | Required   | --                                                                                      |
| `dataset_name`         | Required | Required   | Renamed to `name`                                                                       |
| `name`                 | N/A      | ✅ Required | Renamed from `dataset_name`                                                             |
| `dataset_type`         | Required | ❌ Removed  | No longer required                                                                      |
| `data`                 | Required | Required   | Renamed to `examples`                                                                   |
| `examples`             | N/A      | ✅ Required | Renamed from `data`; accepts DataFrame or list of dicts                                 |
| `convert_dict_to_json` | Optional | ❌ Removed  | Automatic conversion in v8                                                              |
| `max_chunk_size`       | Optional | ❌ Removed  | Now configured at [client level](/api-clients/python/version-8/overview#payload-limits) |
| `force_http`           | N/A      | ✅ Optional | Force HTTP upload instead of gRPC (default False)                                       |

### Side-by-Side Comparison

<CodeGroup>
  ```python Version 7 theme={null}
  from arize.experimental.datasets import ArizeDatasetsClient
  from arize.pandas.proto import flight_pb2
  import pandas as pd

  # Client initialization
  client = ArizeDatasetsClient(api_key="your-developer-key")

  # Create dataset
  dataset_id = client.create_dataset(
      space_id="your-space-id",
      dataset_name="my-dataset",
      dataset_type=flight_pb2.DatasetType.GENERATIVE,
      data=dataset_df,
      convert_dict_to_json=True,
      max_chunk_size=1000
  )
  ```

  ```python Version 8 theme={null}
  from arize import ArizeClient
  import pandas as pd

  # Client initialization with max_chunk_size configured at client level
  client = ArizeClient(
      api_key="your-api-key",
      pyarrow_max_chunksize=1000  # Configured at client level
  )

  # Create dataset
  dataset = client.datasets.create(
      name="my-dataset",  # Renamed from dataset_name
      space_id="your-space-id",
      examples=dataset_df,  # Renamed from data
      force_http=False
      # dataset_type removed
      # convert_dict_to_json removed (automatic)
      # max_chunk_size now configured at client level
  )
  dataset_id = dataset.id
  ```
</CodeGroup>

## get\_dataset()

The `get_dataset()` method has a different behavior in v8. In v7, `client.get_dataset()` returned the dataset examples (underlying data). In v8, `client.datasets.get()` returns only the dataset metadata and versions, while `client.datasets.list_examples()` retrieves the actual examples.

### Parameter Reference

**For dataset metadata (v8's `datasets.get()`):**

| Parameter                  | v7       | v8        | Changes                                        |
| -------------------------- | -------- | --------- | ---------------------------------------------- |
| `space_id`                 | Required | ❌ Removed | Not needed in v8                               |
| `dataset_id`               | Optional | Required  | Now required; no longer accepts `dataset_name` |
| `dataset_name`             | Optional | ❌ Removed | Use `dataset_id` instead                       |
| `dataset_version`          | Optional | ❌ Removed | All versions are returned in metadata          |
| `convert_json_str_to_dict` | Optional | N/A       | Only applies to examples, not metadata         |

**For dataset examples (v8's `datasets.list_examples()`):**

| Parameter                  | v7       | v8         | Changes                                                                                                                 |
| -------------------------- | -------- | ---------- | ----------------------------------------------------------------------------------------------------------------------- |
| `dataset_id`               | Optional | Required   | --                                                                                                                      |
| `dataset_version`          | Optional | Optional   | Renamed to `dataset_version_id`                                                                                         |
| `dataset_version_id`       | N/A      | ✅ Optional | If empty, returns latest version                                                                                        |
| `limit`                    | N/A      | ✅ Optional | Maximum number of examples per page (default 100); ignored if `all=True`                                                |
| `all`                      | N/A      | ✅ Optional | When `True`, retrieves all examples via Flight (bypasses pagination). When `False` (default), uses REST with pagination |
| `convert_json_str_to_dict` | Optional | ❌ Removed  | Automatic conversion in v8                                                                                              |

### Side-by-Side Comparison

<CodeGroup>
  ```python Version 7 theme={null}
  from arize.experimental.datasets import ArizeDatasetsClient

  # Client initialization
  client = ArizeDatasetsClient(api_key="your-developer-key")

  # Get dataset examples (underlying data) by ID
  dataset_df = client.get_dataset(
      space_id="your-space-id",
      dataset_id="dataset-123",
      dataset_version="v1",
      convert_json_str_to_dict=True
  )
  # Returns: pandas DataFrame with the dataset examples

  # Or get by name
  dataset_df = client.get_dataset(
      space_id="your-space-id",
      dataset_name="my-dataset"
  )
  ```

  ```python Version 8 theme={null}
  from arize import ArizeClient

  # Client initialization
  client = ArizeClient(api_key="your-api-key")

  # Step 1: Get dataset metadata (includes versions)
  dataset = client.datasets.get(dataset_id="dataset-123")
  # Returns: Dataset object with metadata and versions
  # Does NOT include examples (underlying data)

  # Step 2: Get dataset examples (underlying data)
  # Option A: Get all examples at once (uses Flight, more efficient for large datasets)
  examples_response = client.datasets.list_examples(
      dataset_id="dataset-123",
      dataset_version_id="version-456",  # Optional, defaults to latest
      all=True  # Retrieves all examples via Flight (bypasses pagination)
  )
  examples = examples_response.to_df()

  # Option B: Use pagination (uses REST, better for browsing/previewing)
  examples_response = client.datasets.list_examples(
      dataset_id="dataset-123",
      limit=100,  # Get 100 examples per page
      all=False  # Use REST with pagination (default)
  )
  examples = examples_response.to_df()
  # Use response.pagination.cursor for next page if needed
  ```
</CodeGroup>

## delete\_dataset()

The `delete_dataset()` method migrates from `client.delete_dataset()` to `client.datasets.delete()`.

### Parameter Reference

| Parameter      | v7       | v8        | Changes                                        |
| -------------- | -------- | --------- | ---------------------------------------------- |
| `space_id`     | Required | ❌ Removed | Not needed in v8                               |
| `dataset_id`   | Optional | Required  | Now required; no longer accepts `dataset_name` |
| `dataset_name` | Optional | ❌ Removed | Use `dataset_id` instead                       |

### Side-by-Side Comparison

<CodeGroup>
  ```python Version 7 theme={null}
  from arize.experimental.datasets import ArizeDatasetsClient

  # Client initialization
  client = ArizeDatasetsClient(api_key="your-developer-key")

  # Delete dataset by ID
  success = client.delete_dataset(
      space_id="your-space-id",
      dataset_id="dataset-123"
  )

  # Or delete by name
  success = client.delete_dataset(
      space_id="your-space-id",
      dataset_name="my-dataset"
  )
  ```

  ```python Version 8 theme={null}
  from arize import ArizeClient

  # Client initialization
  client = ArizeClient(api_key="your-api-key")

  # Delete dataset by ID
  client.datasets.delete(dataset_id="dataset-123")
  # Returns None on success
  ```
</CodeGroup>
