> ## Documentation Index
> Fetch the complete documentation index at: https://arize-ax.mintlify.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Python SDK v8

> Comprehensive Python client for building, evaluating, and monitoring AI applications. Trace LLM apps, run experiments, create datasets, and monitor ML models. This SDK is currently in beta.

## Introduction

The Arize Python SDK v8 is a comprehensive client library for building, evaluating, and monitoring AI applications. Whether you're developing LLM-powered applications or traditional ML models, this SDK provides the tools you need for complete observability and continuous improvement.

**Arize Platform:**

* **[Arize AX](https://arize.com/)** — Enterprise AI engineering platform with embedded AI Copilot
* **[Phoenix](https://github.com/Arize-ai/phoenix)** — Open-source tracing and evaluation framework
* **[OpenInference](https://github.com/Arize-ai/openinference)** — Instrumentation for LLM applications

With over 1 trillion inferences and spans, 10 million evaluation runs, and 2 million OSS downloads monthly, Arize powers AI observability at scale.

## Key Features

* **[Tracing](https://arize.com/docs/ax/observe/tracing)** - Trace your LLM application's runtime using OpenTelemetry-based instrumentation
* **[Evaluation](https://arize.com/docs/ax/evaluate/online-evals)** - Leverage LLMs to benchmark your application's performance
* **[Datasets](https://arize.com/docs/ax/develop/datasets)** - Create versioned datasets for experimentation, evaluation, and fine-tuning
* **[Experiments](https://arize.com/docs/ax/develop/datasets-and-experiments)** - Track and evaluate changes to prompts, models, and retrieval
* **[Prompt Management](/api-clients/python/version-8/client-resources/prompts)** - Manage prompt changes with version control and experimentation
* **[Evaluators](/api-clients/python/version-8/client-resources/evaluators)** - Create and manage LLM-as-judge evaluators with versioned template configurations
* **Playground (Coming Soon)** - Optimize prompts, compare models, and replay traced LLM calls

## Installation

Install the base package:

```bash theme={null}
pip install arize
```

### Optional Dependencies

The following optional extras provide specialized functionality:

<Note>
  **Note:** The `otel` extra installs the `arize-otel` package, which is also available as a standalone package. If you only need auto-instrumentation without the full SDK, install `arize-otel` directly.
</Note>

| Extra          | Install Command                 | What It Provides                                                                                                    |
| -------------- | ------------------------------- | ------------------------------------------------------------------------------------------------------------------- |
| **otel**       | `pip install arize[otel]`       | OpenTelemetry auto-instrumentation package (arize-otel) for automatic tracing                                       |
| **embeddings** | `pip install arize[embeddings]` | Automatic embedding generation for NLP, CV, and structured data (Pillow, datasets, tokenizers, torch, transformers) |
| **mimic**      | `pip install arize[mimic]`      | MIMIC explainer for model interpretability                                                                          |

Install multiple extras:

```bash theme={null}
pip install arize[otel,embeddings,mimic]
```

## Getting Started

The `ArizeClient` is the **recommended entry point** for all SDK operations.

```python theme={null}
from arize import ArizeClient

client = ArizeClient(api_key="your-api-key")
# Or use environment variable: ARIZE_API_KEY
# More configuration options explained below
```

The SDK uses a **lazy-loading subclient architecture** to minimize package size and startup time.
The client provides access to specialized sub-clients, loaded on first access, for different operations:

* `client.ai_integrations` - LLM provider integration management
* `client.annotation_configs` - Annotation config management
* `client.annotation_queues` - Annotation queue and record management
* `client.api_keys` - API key management
* `client.datasets` - Dataset management
* `client.evaluators` - LLM-as-judge evaluator management
* `client.experiments` - Experiment tracking and evaluation
* `client.ml` - Traditional ML model logging
* `client.organizations` - Organization management
* `client.projects` - Project management
* `client.prompts` - Prompt template management and versioning
* `client.resource_restrictions` - Resource restriction (RBAC) management
* `client.role_bindings` - Role binding (RBAC) management
* `client.roles` - Custom RBAC role management
* `client.spaces` - Space management
* `client.spans` - LLM tracing and spans operations
* `client.tasks` - Evaluation task and task run management
* `client.users` - User management

The client provides a unified, discoverable interface following the pattern:

```
client.<resource>.<action>()
```

This structured approach makes it easy to explore available operations through IDE
autocomplete and discover everything the SDK can do.

```python theme={null}
from arize import ArizeClient

client = ArizeClient(api_key="your-api-key")

# Example: client.<resource>.<action>()
client.datasets.list(space="your-space-name-or-id")
client.experiments.run(name="...", dataset="...", task=my_task)
client.spans.log(space_id="...", project_name="...", dataframe=spans_df)
```

## Configuration Options

Configure the client with constructor parameters or environment variables. Each
configuration parameter follows this resolution order:

1. **Constructor parameter** (highest priority)
2. **Environment variable**
3. **Built-in default** (lowest priority)

### Basic Configuration

#### Authentication

Authenticate using API keys obtained from the Arize Platform. The API key is required for all SDK operations and can be provided via constructor parameter or environment variable. If not provided, the SDK will raise a `MissingAPIKeyError`.

Defaults:

* `api_key` - **required** (no default)

<CodeGroup>
  ```python In Code theme={null}
  from arize import ArizeClient

  client = ArizeClient(api_key="your-api-key")
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_API_KEY=your-api-key
  ```
</CodeGroup>

#### Region

Specify the Arize region you are trying to interact with (e.g., US\_CENTRAL\_1A, EU\_WEST\_1A).
When a region is specified, it overrides individual host settings for all endpoints
(API, OTLP, and Flight). This provides a convenient way to configure all endpoints
at once for a specific region.

This option is mutually exclusive with `single_host`/`single_port` and `base_domain`.

Defaults:

* `region` - `Region.UNSET` (no region-based override)

<CodeGroup>
  ```python In Code theme={null}
  from arize import ArizeClient
  from arize.regions import Region

  client = ArizeClient(
      region=Region.US_CENTRAL_1A,  # Overrides all host/port settings
  )
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_REGION=US_CENTRAL_1A
  ```
</CodeGroup>

**Available Regions:**

| Region Identifier | Cloud Provider | API Host                      | OTLP Host                      | Flight Host                      |
| ----------------- | -------------- | ----------------------------- | ------------------------------ | -------------------------------- |
| `CA_CENTRAL_1A`   | GCP            | `api.ca-central-1a.arize.com` | `otlp.ca-central-1a.arize.com` | `flight.ca-central-1a.arize.com` |
| `EU_WEST_1A`      | GCP            | `api.eu-west-1a.arize.com`    | `otlp.eu-west-1a.arize.com`    | `flight.eu-west-1a.arize.com`    |
| `US_CENTRAL_1A`   | GCP            | `api.us-central-1a.arize.com` | `otlp.us-central-1a.arize.com` | `flight.us-central-1a.arize.com` |
| `US_EAST_1B`      | AWS            | `api.us-east-1b.arize.com`    | `otlp.us-east-1b.arize.com`    | `flight.us-east-1b.arize.com`    |

#### Logging

Control the SDK's internal logging behavior. Configure the logging level to adjust verbosity, enable structured JSON logs for machine parsing, or disable logging entirely. SDK logs provide visibility into operations like API calls, caching, and error conditions.

Defaults:

* `ARIZE_LOG_ENABLE` - `true`
* `ARIZE_LOG_LEVEL` - `INFO`
* `ARIZE_LOG_STRUCTURED` - `false`

<CodeGroup>
  ```python In Code theme={null}
  from arize.logging import configure_logging
  import logging

  configure_logging(
      level=logging.DEBUG,
      structured=True,  # Emit JSON logs
  )
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_LOG_ENABLE=true
  export ARIZE_LOG_LEVEL=debug
  export ARIZE_LOG_STRUCTURED=true
  ```
</CodeGroup>

#### Caching

The SDK caches large datasets locally to speed up experiment iteration. When enabled, datasets are stored in Parquet format in the cache directory, reducing download time for repeated access. The `arize_directory` parameter specifies where the SDK stores cache files, logs, and other persistent data. Cache files are stored in `{arize_directory}/cache/`.

Defaults:

* `enable_caching` - `True`
* `arize_directory` - `~/.arize`

<CodeGroup>
  ```python In Code theme={null}
  from arize import ArizeClient

  client = ArizeClient(
      enable_caching=True,
      arize_directory="~/.arize",
  )

  # Clear cache manually if needed
  client.clear_cache()
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_ENABLE_CACHING=true
  export ARIZE_DIRECTORY=~/.arize
  ```
</CodeGroup>

### Advanced Configuration

Configure advanced SDK settings for custom deployments, performance tuning, and specific networking requirements.

<Note>
  **Endpoint Override Mutual Exclusivity:** The SDK provides three mutually exclusive ways to override endpoint locations: `region`, `single_host`/`single_port`, and `base_domain`. Specifying more than one will raise a `MultipleEndpointOverridesError`. If none are specified, individual per-endpoint host/port settings are used.
</Note>

#### Custom Endpoints

Override default endpoint locations for custom deployments, on-premise installations, or non-standard environments. The SDK uses three types of endpoints: API (REST operations), OTLP (OpenTelemetry tracing), and Flight (bulk data transfers via gRPC).

Defaults:

* `api_host` - `api.arize.com`
* `api_scheme` - `https`
* `otlp_host` - `otlp.arize.com`
* `otlp_scheme` - `https`
* `flight_host` - `flight.arize.com`
* `flight_port` - `443`
* `flight_scheme` - `grpc+tls`

<CodeGroup>
  ```python In Code theme={null}
  from arize import ArizeClient

  client = ArizeClient(
      api_host="custom-api.example.com",
      api_scheme="https",
      otlp_host="custom-otlp.example.com",
      otlp_scheme="https",
      flight_host="custom-flight.example.com",
      flight_port=8815,
      flight_scheme="grpc+tls",
  )
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_API_HOST=custom-api.example.com
  export ARIZE_API_SCHEME=https
  export ARIZE_OTLP_HOST=custom-otlp.example.com
  export ARIZE_OTLP_SCHEME=https
  export ARIZE_FLIGHT_HOST=custom-flight.example.com
  export ARIZE_FLIGHT_PORT=8815
  export ARIZE_FLIGHT_SCHEME=grpc+tls
  ```
</CodeGroup>

#### Single Endpoint Override

Use a single host and port for all SDK endpoints (API, OTLP, and Flight). This is a convenience option for environments where all services are behind a single load balancer or proxy.

This option is mutually exclusive with `region` and `base_domain`.

Defaults:

* `single_host` - \`\` (not set)
* `single_port` - `0` (not set)

<CodeGroup>
  ```python In Code theme={null}
  from arize import ArizeClient

  client = ArizeClient(
      single_host="proxy.example.com",
      single_port=443,
  )
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_SINGLE_HOST=proxy.example.com
  export ARIZE_SINGLE_PORT=443
  ```
</CodeGroup>

#### Private Connect Override

Use a base domain to automatically generate endpoint hosts for Private Connect setups. When specified, the SDK generates hosts as `api.<base_domain>`, `otlp.<base_domain>`, and `flight.<base_domain>`. This is the recommended approach for Private Connect deployments where all services share a common base domain.

This option is mutually exclusive with `region` and `single_host`/`single_port`.

Defaults:

* `base_domain` - \`\` (not set)

<CodeGroup>
  ```python In Code theme={null}
  from arize import ArizeClient

  client = ArizeClient(
      base_domain="private.example.com",
      # Generates:
      # - api_host: api.private.example.com
      # - otlp_host: otlp.private.example.com
      # - flight_host: flight.private.example.com
  )
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_BASE_DOMAIN=private.example.com
  ```
</CodeGroup>

#### TLS Verification

Control TLS certificate verification for HTTP requests. Disable verification only in trusted development environments with self-signed certificates or when behind corporate proxies with certificate inspection. Always keep verification enabled in production.

Defaults:

* `request_verify` - `True`

<CodeGroup>
  ```python In Code theme={null}
  from arize import ArizeClient

  client = ArizeClient(
      request_verify=True,  # TLS certificate verification
  )
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_REQUEST_VERIFY=true
  ```
</CodeGroup>

#### Payload Limits

Configure maximum payload sizes for HTTP requests and Arrow data processing. Increase these limits if working with very large datasets or reduce them to catch oversized requests earlier.

Defaults:

* `max_http_payload_size_mb` - `100`
* `pyarrow_max_chunksize` - `10000`

<CodeGroup>
  ```python In Code theme={null}
  from arize import ArizeClient

  client = ArizeClient(
      max_http_payload_size_mb=100,  # Max HTTP payload size in MB
      pyarrow_max_chunksize=10000,   # Max Arrow chunk size
  )
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_MAX_HTTP_PAYLOAD_SIZE_MB=100
  export ARIZE_MAX_CHUNKSIZE=10000
  ```
</CodeGroup>

#### Streaming

Configure concurrent processing for streaming operations like ML model logging. Adjust worker threads and queue size to optimize throughput for your workload.

Defaults:

* `stream_max_workers` - `8`
* `stream_max_queue_bound` - `5000`

<CodeGroup>
  ```python In Code theme={null}
  from arize import ArizeClient

  client = ArizeClient(
      stream_max_workers=8,         # Max worker threads
      stream_max_queue_bound=5000,  # Max queue size
  )
  ```

  ```bash Environment Variables theme={null}
  export ARIZE_STREAM_MAX_WORKERS=8
  export ARIZE_STREAM_MAX_QUEUE_BOUND=5000
  ```
</CodeGroup>

## Transport Options

The SDK intelligently selects the best transport method based on payload size:

* **HTTP/REST**: Default for smaller payloads, compatible with all environments
* **gRPC + Arrow Flight**: Automatically used for large datasets, experiments, and bulk operations

Benefits of Arrow Flight:

* 10-100x faster for large datasets
* Efficient binary serialization
* Minimal memory overhead

Force HTTP transport when needed:

```python theme={null}
client.datasets.create(
    space="your-space-name-or-id",
    name="my-dataset",
    examples=large_examples_list,
    force_http=True,  # Bypass Arrow Flight
)
```

## Response Objects

All SDK API responses are structured Pydantic models that provide type safety, validation, and IDE autocomplete support. Response objects offer convenient methods for data access, conversion, and exploration.

### Response Types

The SDK returns two main types of responses:

**List Responses** - Return collections with pagination metadata:

```python theme={null}
# List operations return collections
resp = client.datasets.list(space="your-space-name-or-id")

# Access the collection
for dataset in resp.datasets:
    print(dataset.id, dataset.name)

# Check pagination
if resp.pagination.has_more:
    next_cursor = resp.pagination.next_cursor
```

**Single Object Responses** - Return individual resources:

```python theme={null}
# Get operations return single objects
dataset = client.datasets.get(dataset="dataset-name-or-id")

print(dataset.id)
print(dataset.name)
print(dataset.created_at)
```

### Field Introspection

Explore available fields on any response object using `model_fields`:

```python theme={null}
resp = client.datasets.list(space="your-space-name-or-id")

# Inspect response structure
print(resp.model_fields)
# {
#   'datasets': FieldInfo(annotation=List[Dataset], required=True, description='A list of datasets'),
#   'pagination': FieldInfo(annotation=PaginationMetadata, required=True)
# }

# Inspect nested objects
print(resp.pagination.model_fields)
# {
#   'next_cursor': FieldInfo(annotation=Union[str, None], required=False, default=None),
#   'has_more': FieldInfo(annotation=bool, required=True)
# }
```

### Data Conversion

Convert response objects to different formats for further processing:

**Dictionary Format** - Access as Python dict:

```python theme={null}
resp = client.datasets.list(space="your-space-name-or-id")
data = resp.to_dict()

# Now a standard Python dictionary
print(data['datasets'][0]['name'])
```

**JSON Format** - Serialize for storage or APIs:

```python theme={null}
resp = client.datasets.list(space="your-space-name-or-id")
json_str = resp.to_json()

# Save to file or send via API
with open('datasets.json', 'w') as f:
    f.write(json_str)
```

**DataFrame Format** - Analyze with pandas:

```python theme={null}
resp = client.datasets.list(space="your-space-name-or-id")
df = resp.to_df()

# Now a pandas DataFrame for analysis
print(df.head())
print(df.describe())
df.to_csv('datasets.csv')
```

### Pagination

List responses include pagination metadata for fetching additional pages:

```python theme={null}
# Get first page
resp = client.datasets.list(space="your-space-name-or-id", limit=50)

print(f"Retrieved {len(resp.datasets)} datasets")
print(f"Has more: {resp.pagination.has_more}")

# Fetch next page if available
if resp.pagination.has_more:
    next_resp = client.datasets.list(
        space="your-space-name-or-id",
        limit=50,
        cursor=resp.pagination.next_cursor,
    )
```

## Pre-Release API Warnings

Pre-release APIs (ALPHA and BETA) are actively evolving based on user feedback.
While BETA endpoints are mostly stable with rare breaking changes, ALPHA endpoints
are experimental and breaking changes are expected.

<Note>
  For detailed information about API version stages, stability guarantees, and recommendations,
  see [API Version Stages](/ax/rest-reference#api-version-stages) in the REST API reference.
</Note>

The SDK ensures you're always informed when using pre-release APIs through the
one-time warning system, helping you make informed decisions about which features to adopt.
When you first call a pre-release endpoint in your application, the SDK will
emit a one-time warning via Python's logging system:

```python theme={null}
from arize import ArizeClient

client = ArizeClient(api_key="your-api-key")

# First call to a beta endpoint
projects = client.projects.list()
# arize.pre_releases | WARNING | [BETA] projects.list is a beta API in Arize...
#    ...SDK v8.0.0a23 and may change without notice.

# Subsequent calls to the same endpoint won't trigger the warning
projects = client.projects.list()  # No warning
```

## Useful Links

<CardGroup cols={2}>
  <Card title="Python SDK API Reference" href="https://arize-client-python.readthedocs.io/en/latest/" icon="book">
    Complete API reference documentation
  </Card>

  <Card title="Tracing Guide" href="https://arize.com/docs/ax/observe/tracing" icon="chart-line">
    Learn about LLM tracing
  </Card>

  <Card title="Datasets & Experiments" href="https://arize.com/docs/ax/develop/datasets-and-experiments" icon="flask">
    Experimentation workflows
  </Card>

  <Card title="GitHub Repository" href="https://github.com/Arize-ai/client_python" icon="github">
    Source code and issues
  </Card>
</CardGroup>

<CardGroup cols={1}>
  <Card title="Changelog" href="https://github.com/Arize-ai/client_python/blob/main/CHANGELOG.md" icon="list">
    View version history and release notes
  </Card>
</CardGroup>
