Skip to main content
Version 8 is currently in pre-release. Features and APIs may change before the stable release. Install with pip install --pre arize.

Introduction

The Arize Python SDK v8 is a comprehensive client library for building, evaluating, and monitoring AI applications. Whether you’re developing LLM-powered applications or traditional ML models, this SDK provides the tools you need for complete observability and continuous improvement. Arize Platform:
  • Arize AX — Enterprise AI engineering platform with embedded AI Copilot
  • Phoenix — Open-source tracing and evaluation framework
  • OpenInference — Instrumentation for LLM applications
With over 1 trillion inferences and spans, 10 million evaluation runs, and 2 million OSS downloads monthly, Arize powers AI observability at scale.

Key Features

  • Tracing - Trace your LLM application’s runtime using OpenTelemetry-based instrumentation
  • Evaluation - Leverage LLMs to benchmark your application’s performance
  • Datasets - Create versioned datasets for experimentation, evaluation, and fine-tuning
  • Experiments - Track and evaluate changes to prompts, models, and retrieval
  • Playground - Optimize prompts, compare models, and replay traced LLM calls
  • Prompt Management - Manage prompt changes with version control and experimentation

Installation

Install the base package:
pip install --pre arize
For OpenTelemetry auto-instrumentation:
pip install arize-otel

Optional Dependencies

The SDK uses lazy loading, allowing you to install only the features you need:
ExtraInstall CommandWhat It Provides
spanspip install arize[spans]OpenTelemetry tracing, spans logging, evaluations
ml-streampip install arize[ml-stream]Stream logging for ML model predictions
ml-batchpip install arize[ml-batch]Batch logging with pandas DataFrames and Parquet
datasets-experimentspip install arize[datasets-experiments]Dataset management and experiment tracking
auto-embeddingspip install arize[auto-embeddings]Automatic embedding generation for NLP, CV, and structured data
Install multiple extras:
pip install --pre arize[spans,datasets-experiments,auto-embeddings]

Getting Started

The ArizeClient is the recommended entry point for all SDK operations.
from arize import ArizeClient

client = ArizeClient(api_key="your-api-key")
# Or use environment variable: ARIZE_API_KEY
# More configuration options explained below
The SDK uses a lazy-loading subclient architecture to minimize package size and startup time. The client provides access to specialized sub-clients, loaded on first access, for different operations:
  • client.datasets - Dataset management
  • client.experiments - Experiment tracking and evaluation
  • client.models - Traditional ML model logging
  • client.projects - Project management
  • client.spans - LLM tracing and spans operations
  • etc.
The client provides a unified, discoverable interface following the pattern:
client.<resource>.<action>()
This structured approach makes it easy to explore available operations through IDE autocomplete and discover everything the SDK can do.
from arize import ArizeClient

client = ArizeClient(api_key="your-api-key")

# Example: client.<resource>.<action>()
client.datasets.list(space_id="your-space-id")
client.experiments.run(dataset_id="...", task=my_task)
client.spans.log(space_id="...", project_name="...", spans=[...])

Configuration Options

Configure the client with constructor parameters or environment variables. Each configuration parameter follows this resolution order:
  1. Constructor parameter (highest priority)
  2. Environment variable
  3. Built-in default (lowest priority)

Basic Configuration

Authentication

Authenticate using API keys obtained from the Arize Platform. The API key is required for all SDK operations and can be provided via constructor parameter or environment variable. If not provided, the SDK will raise a MissingAPIKeyError. Defaults:
  • api_key - required (no default)
from arize import ArizeClient

client = ArizeClient(api_key="your-api-key")

Region

Specify the Arize region you are trying to interact with (e.g., US_CENTRAL, EU_WEST). When a region is specified, it overrides individual host settings for all endpoints (API, OTLP, and Flight). This provides a convenient way to configure all endpoints at once for a specific region. Defaults:
  • region - Region.UNSPECIFIED (no region-based override)
from arize import ArizeClient
from arize.regions import Region

client = ArizeClient(
    region=Region.US_CENTRAL,  # Overrides all host/port settings
)
Available Regions:
RegionAPI HostOTLP HostFlight Host
US_CENTRAL_1api.us-central-1a.arize.comotlp.us-central-1a.arize.comflight.us-central-1a.arize.com
EU_WEST_1api.eu-west-1a.arize.comotlp.eu-west-1a.arize.comflight.eu-west-1a.arize.com
CA_CENTRAL_1api.ca-central-1a.arize.comotlp.ca-central-1a.arize.comflight.ca-central-1a.arize.com
US_EAST_1api.us-east-1b.arize.comotlp.us-east-1b.arize.comflight.us-east-1b.arize.com

Logging

Control the SDK’s internal logging behavior. Configure the logging level to adjust verbosity, enable structured JSON logs for machine parsing, or disable logging entirely. SDK logs provide visibility into operations like API calls, caching, and error conditions. Defaults:
  • ARIZE_LOG_ENABLE - true
  • ARIZE_LOG_LEVEL - INFO
  • ARIZE_LOG_STRUCTURED - false
from arize.logging import configure_logging
import logging

configure_logging(
    level=logging.DEBUG,
    structured=True,  # Emit JSON logs
)

Caching

The SDK caches large datasets locally to speed up experiment iteration. When enabled, datasets are stored in Parquet format in the cache directory, reducing download time for repeated access. The arize_directory parameter specifies where the SDK stores cache files, logs, and other persistent data. Cache files are stored in {arize_directory}/cache/. Defaults:
  • enable_caching - True
  • arize_directory - ~/.arize
from arize import ArizeClient

client = ArizeClient(
    enable_caching=True,
    arize_directory="~/.arize",
)

# Clear cache manually if needed
client.clear_cache()

Advanced Configuration

Configure advanced SDK settings for custom deployments, performance tuning, and specific networking requirements.

Custom Endpoints

Override default endpoint locations for custom deployments, on-premise installations, or non-standard environments. The SDK uses three types of endpoints: API (REST operations), OTLP (OpenTelemetry tracing), and Flight (bulk data transfers via gRPC). Defaults:
  • api_host - api.arize.com
  • api_scheme - https
  • otlp_host - otlp.arize.com
  • otlp_scheme - https
  • flight_host - flight.arize.com
  • flight_port - 443
  • flight_scheme - grpc+tls
from arize import ArizeClient

client = ArizeClient(
    api_host="custom-api.example.com",
    api_scheme="https",
    otlp_host="custom-otlp.example.com",
    otlp_scheme="https",
    flight_host="custom-flight.example.com",
    flight_port=8815,
    flight_scheme="grpc+tls",
)

Single Endpoint Override

Use a single host and port for all SDK endpoints (API, OTLP, and Flight). This is a convenience option for environments where all services are behind a single load balancer or proxy. Note that region configuration takes precedence over this setting. Defaults:
  • single_host - “ (not set)
  • single_port - 0 (not set)
from arize import ArizeClient

client = ArizeClient(
    single_host="proxy.example.com",
    single_port=443,
)

SSL Verification

Control SSL certificate verification for HTTP requests. Disable verification only in trusted development environments with self-signed certificates or when behind corporate proxies with certificate inspection. Always keep verification enabled in production. Defaults:
  • request_verify - True
from arize import ArizeClient

client = ArizeClient(
    request_verify=True,  # SSL certificate verification
)

Payload Limits

Configure maximum payload sizes for HTTP requests and Arrow data processing. Increase these limits if working with very large datasets or reduce them to catch oversized requests earlier. Defaults:
  • max_http_payload_size_mb - 100
  • pyarrow_max_chunksize - 10000
from arize import ArizeClient

client = ArizeClient(
    max_http_payload_size_mb=100,  # Max HTTP payload size in MB
    pyarrow_max_chunksize=10000,   # Max Arrow chunk size
)

Streaming

Configure concurrent processing for streaming operations like ML model logging. Adjust worker threads and queue size to optimize throughput for your workload. Defaults:
  • stream_max_workers - 8
  • stream_max_queue_bound - 5000
from arize import ArizeClient

client = ArizeClient(
    stream_max_workers=8,         # Max worker threads
    stream_max_queue_bound=5000,  # Max queue size
)

Transport Options

The SDK intelligently selects the best transport method based on payload size:
  • HTTP/REST: Default for smaller payloads, compatible with all environments
  • gRPC + Arrow Flight: Automatically used for large datasets, experiments, and bulk operations
Benefits of Arrow Flight:
  • 10-100x faster for large datasets
  • Efficient binary serialization
  • Minimal memory overhead
Force HTTP transport when needed:
client.datasets.create(
    space_id="your-space-id",
    name="my-dataset",
    examples=large_examples_list,
    force_http=True,  # Bypass Arrow Flight
)

Response Objects

All SDK API responses are structured Pydantic models that provide type safety, validation, and IDE autocomplete support. Response objects offer convenient methods for data access, conversion, and exploration.

Response Types

The SDK returns two main types of responses: List Responses - Return collections with pagination metadata:
# List operations return collections
resp = client.datasets.list(space_id="your-space-id")

# Access the collection
for dataset in resp.datasets:
    print(dataset.id, dataset.name)

# Check pagination
if resp.pagination.has_more:
    next_cursor = resp.pagination.next_cursor
Single Object Responses - Return individual resources:
# Get operations return single objects
dataset = client.datasets.get(dataset_id="dataset-id")

print(dataset.id)
print(dataset.name)
print(dataset.created_at)

Field Introspection

Explore available fields on any response object using model_fields:
resp = client.datasets.list(space_id="your-space-id")

# Inspect response structure
print(resp.model_fields)
# {
#   'datasets': FieldInfo(annotation=List[Dataset], required=True, description='A list of datasets'),
#   'pagination': FieldInfo(annotation=PaginationMetadata, required=True)
# }

# Inspect nested objects
print(resp.pagination.model_fields)
# {
#   'next_cursor': FieldInfo(annotation=Union[str, None], required=False, default=None),
#   'has_more': FieldInfo(annotation=bool, required=True)
# }

Data Conversion

Convert response objects to different formats for further processing: Dictionary Format - Access as Python dict:
resp = client.datasets.list(space_id="your-space-id")
data = resp.to_dict()

# Now a standard Python dictionary
print(data['datasets'][0]['name'])
JSON Format - Serialize for storage or APIs:
resp = client.datasets.list(space_id="your-space-id")
json_str = resp.to_json()

# Save to file or send via API
with open('datasets.json', 'w') as f:
    f.write(json_str)
DataFrame Format - Analyze with pandas:
resp = client.datasets.list(space_id="your-space-id")
df = resp.to_df()

# Now a pandas DataFrame for analysis
print(df.head())
print(df.describe())
df.to_csv('datasets.csv')

Pagination

List responses include pagination metadata for fetching additional pages:
# Get first page
resp = client.datasets.list(space_id="your-space-id", limit=50)

print(f"Retrieved {len(resp.datasets)} datasets")
print(f"Has more: {resp.pagination.has_more}")

# Fetch next page if available
if resp.pagination.has_more:
    next_resp = client.datasets.list(
        space_id="your-space-id",
        limit=50,
        cursor=resp.pagination.next_cursor,
    )

Pre-Release API Warnings

Pre-release APIs (ALPHA and BETA) are actively evolving based on user feedback. While BETA endpoints are mostly stable with rare breaking changes, ALPHA endpoints are experimental and breaking changes are expected.
For detailed information about API version stages, stability guarantees, and recommendations, see API Version Stages in the REST API reference.
The SDK ensures you’re always informed when using pre-release APIs through the one-time warning system, helping you make informed decisions about which features to adopt. When you first call a pre-release endpoint in your application, the SDK will emit a one-time warning via Python’s logging system:
from arize import ArizeClient

client = ArizeClient(api_key="your-api-key")

# First call to a beta endpoint
projects = client.projects.list()
# arize.pre_releases | WARNING | [BETA] projects.list is a beta API in Arize...
#    ...SDK v8.0.0a23 and may change without notice.

# Subsequent calls to the same endpoint won't trigger the warning
projects = client.projects.list()  # No warning