1 of 22

SDK and API Reference

Rest API

Annotation Configs

Annotations

Datasets

REST API methods for interacting with Phoenix datasets

Experiments

REST API methods for interacting with Phoenix experiments

Spans

REST API methods for interacting with Phoenix spans

Traces

REST API methods for interacting with Phoenix traces

Prompts

REST API methods for interacting with Phoenix prompts

Projects

REST API methods for interacting with Phoenix projects

Users

Python

Overview

Features

- Interact with Phoenix's OpenAPI REST interface
- - Methods for loading, inspecting, and interacting with tracing and evaluation sessions
- - Upload and download data to and from Phoenix
- - Define and run evaluations across models, traces, or datasets
- - Track, evaluate, and compare multiple runs under different conditions
- - Integrate tracing and metrics using OpenTelemetry for automatic or manual instrumentation
- - Transform spans and evaluations into datasets, define structures for features and predictions, and manage embedding columns
- Pull, push, and invoke prompts stored in Phoenix
- Create, read, update, and delete projects in Phoenix

Installation

Install via pip.

Usage

Authentication (if applicable)

Phoenix API key can be an environment variable...

...or passed directly to the client.

Custom Headers

By default, the Phoenix client will use the bearer authentication scheme in the HTTP headers, but if you need different headers, e.g. for Phoenix Cloud, they can also be customized via an environment variable...

...or passed directly to the client.

Prompt Management

With the Phoenix client, you can push and pull prompts to and from your Phoenix server.

The client can retrieve a prompt by its name.

The prompt can be used to generate completions.

To learn more about prompt engineering using Phenix, see the .

Project Management

The Phoenix client provides synchronous and asynchronous interfaces for interacting with Phoenix Projects.

Key Features

Get a project by ID or name
List all projects
Create a new project with optional description
Update a project’s description (note: names cannot be changed)
Delete a project by ID or name

Usage Examples

Python packages

arize-phoenix-evals

Tooling to evaluate LLM applications including RAG relevance, answer relevance, and more.

Phoenix's approach to LLM evals is notable for the following reasons:

Includes pre-tested templates and convenience functions for a set of common Eval “tasks”
Data science rigor applied to the testing of model and template combinations
Designed to run as fast as possible on batches of data
Includes benchmark datasets and tests for each eval function

Installation

Install the arize-phoenix sub-package via pip

pip install arize-phoenix-evals

Note you will also have to install the LLM vendor SDK you would like to use with LLM Evals. For example, to use OpenAI's GPT-4, you will need to install the OpenAI Python SDK:

pip install 'openai>=1.0.0'

Usage

Here is an example of running the RAG relevance eval on a dataset of Wikipedia questions and answers:

import os
from phoenix.evals import (
    RAG_RELEVANCY_PROMPT_TEMPLATE,
    RAG_RELEVANCY_PROMPT_RAILS_MAP,
    OpenAIModel,
    download_benchmark_dataset,
    llm_classify,
)
from sklearn.metrics import precision_recall_fscore_support, confusion_matrix, ConfusionMatrixDisplay

os.environ["OPENAI_API_KEY"] = "<your-openai-key>"

# Download the benchmark golden dataset
df = download_benchmark_dataset(
    task="binary-relevance-classification", dataset_name="wiki_qa-train"
)
# Sample and re-name the columns to match the template
df = df.sample(100)
df = df.rename(
    columns={
        "query_text": "input",
        "document_text": "reference",
    },
)
model = OpenAIModel(
    model="gpt-4",
    temperature=0.0,
)


rails =list(RAG_RELEVANCY_PROMPT_RAILS_MAP.values())
df[["eval_relevance"]] = llm_classify(df, model, RAG_RELEVANCY_PROMPT_TEMPLATE, rails)
#Golden dataset has True/False map to -> "irrelevant" / "relevant"
#we can then scikit compare to output of template - same format
y_true = df["relevant"].map({True: "relevant", False: "irrelevant"})
y_pred = df["eval_relevance"]

# Compute Per-Class Precision, Recall, F1 Score, Support
precision, recall, f1, support = precision_recall_fscore_support(y_true, y_pred)

To learn more about LLM Evals, see the LLM Evals documentation.

arize-phoenix-client

Phoenix Client is a lightweight package for interacting with the Phoenix server.

The Python client is currently a work in progress. Please refer to the status below for the latest updates.

Features

API - Interact with Phoenix's OpenAPI REST interface
Prompt Management - Pull / push / and invoke prompts stored in Phoenix

Installation

Install via pip.

pip install -Uq arize-phoenix-client

Usage

from phoenix.client import Client

client = Client(base_url="your-server-url")  # base_url defaults to http://localhost:6006

Authentication (if applicable)

Phoenix API key can be an environment variable...

import os

os.environ["PHOENIX_API_KEY"] = "your-api-key"

...or passed directly to the client.

from phoenix.client import Client

client = Client(api_key="your-api-key")

Custom Headers

import os

os.environ["PHOENIX_CLIENT_HEADERS"] = "api-key=your-api-key,"  # use `api-key` for Phoenix Cloud

...or passed directly to the client.

from phoenix.client import Client

client = Client(headers={"api-key": "your-api-key"})  # use `api-key` for Phoenix Cloud

Prompt Management

With the Phoenix client, you can push and pull prompts to and from your Phoenix server.

from phoenix.client import Client
from phoenix.client.types import PromptVersion

# Change base_url to your Phoenix server URL
base_url = "http://localhost:6006"
client = Client(base_url=base_url)

# prompt identifier consists of alphanumeric characters, hyphens or underscores
prompt_identifier = "haiku-writer"

content = "Write a haiku about {{topic}}"
prompt = client.prompts.create(
    name=prompt_identifier,
    version=PromptVersion(
        [{"role": "user", "content": content}],
        model_name="gpt-4o-mini",
    ),
)

The client can retrieve a prompt by its name.

prompt = client.prompts.get(prompt_identifier=prompt_identifier)

The prompt can be used to generate completions.

from openai import OpenAI

variables = {"topic": "programming"}
resp = OpenAI().chat.completions.create(**prompt.format(variables=variables))
print(resp.choices[0].message.content)

To learn more about prompt engineering using Phoenix, see the Phoenix documentation.

arize-phoenix-otel

Provides a lightweight wrapper around OpenTelemetry primitives with Phoenix-aware defaults. Phoenix Otel also gives you acces to tracing decorators for common GenAI patterns.

These defaults are aware of environment variables you may have set to configure Phoenix:

PHOENIX_COLLECTOR_ENDPOINT
PHOENIX_PROJECT_NAME
PHOENIX_CLIENT_HEADERS
PHOENIX_API_KEY
PHOENIX_GRPC_PORT

Installation

Install via pip.

pip install -Uq arize-phoenix-otel

Examples

The phoenix.otel module provides a high-level register function to configure OpenTelemetry tracing by setting a global TracerProvider. The register function can also configure headers and whether or not to process spans one by one or by batch.

Quickstart

from phoenix.otel import register
tracer_provider = register()

This is all you need to get started using OTel with Phoenix! register defaults to sending spans to an endpoint at http://localhost using gRPC.

Phoenix Authentication

If the PHOENIX_API_KEY environment variable is set, register will automatically add anauthorization header to each span payload.

Configuring the collector endpoint

There are two ways to configure the collector endpoint:

Using environment variables
Using the endpoint keyword argument

Using environment variables

If you're setting the PHOENIX_COLLECTOR_ENDPOINT environment variable, register will automatically try to send spans to your Phoenix server using gRPC.

# export PHOENIX_COLLECTOR_ENDPOINT=https://your-phoenix.com:6006

from phoenix.otel import register
tracer_provider = register()

Specifying the endpoint directly

When passing in the endpoint argument, you must specify the fully qualified endpoint. For example, in order to export spans via HTTP to localhost, use Phoenix's HTTP collector endpoint:http://localhost:6006/v1/traces. The default gRPC endpoint is different: http://localhost:4317. If the PHOENIX_GRPC_PORT environment variable is set, it will override the default gRPC port.

from phoenix.otel import register
tracer_provider = register(endpoint="http://localhost:6006/v1/traces")

Additionally, the protocol argument can be used to enforce the OTLP transport protocol regardless of the endpoint specified. This might be useful in cases such as when the GRPC endpoint is bound to a different port than the default (4317). The valid protocols are: "http/protobuf", and "grpc".

from phoenix.otel import register
tracer_provider = register(endpoint="http://localhost:9999", protocol="grpc")

Additional configuration

register can be configured with different keyword arguments:

project_name: The Phoenix project name (or PHOENIX_PROJECT_NAME env. var)
headers: Headers to send along with each span payload (or PHOENIX_CLIENT_HEADERS env. var)
batch: Whether or not to process spans in batch

from phoenix.otel import register
tracer_provider = register(
    project_name="otel-test", headers={"Authorization": "Bearer TOKEN"}, batch=True
)

A drop-in replacement for OTel primitives

For more granular tracing configuration, these wrappers can be used as drop-in replacements for OTel primitives:

from opentelemetry import trace as trace_api
from phoenix.otel import HTTPSpanExporter, TracerProvider, SimpleSpanProcessor

tracer_provider = TracerProvider()
span_exporter = HTTPSpanExporter(endpoint="http://localhost:6006/v1/traces")
span_processor = SimpleSpanProcessor(span_exporter=span_exporter)
tracer_provider.add_span_processor(span_processor)
trace_api.set_tracer_provider(tracer_provider)

Wrappers have Phoenix-aware defaults to greatly simplify the OTel configuration process. A specialendpoint keyword argument can be passed to either a TracerProvider, SimpleSpanProcessor orBatchSpanProcessor in order to automatically infer which SpanExporter to use to simplify setup.

Using environment variables

# export PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006

from opentelemetry import trace as trace_api
from phoenix.otel import TracerProvider

tracer_provider = TracerProvider()
trace_api.set_tracer_provider(tracer_provider)

Specifying the endpoint directly

from opentelemetry import trace as trace_api
from phoenix.otel import TracerProvider

tracer_provider = TracerProvider(endpoint="http://localhost:4317")
trace_api.set_tracer_provider(tracer_provider)

Further examples

Users can gradually add OTel components as desired:

Configuring resources

# export PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006

from opentelemetry import trace as trace_api
from phoenix.otel import Resource, PROJECT_NAME, TracerProvider

tracer_provider = TracerProvider(resource=Resource({PROJECT_NAME: "my-project"}))
trace_api.set_tracer_provider(tracer_provider)

Using a BatchSpanProcessor

# export PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006

from opentelemetry import trace as trace_api
from phoenix.otel import TracerProvider, BatchSpanProcessor

tracer_provider = TracerProvider()
batch_processor = BatchSpanProcessor()
tracer_provider.add_span_processor(batch_processor)

Specifying a custom GRPC endpoint

from opentelemetry import trace as trace_api
from phoenix.otel import TracerProvider, BatchSpanProcessor, GRPCSpanExporter

tracer_provider = TracerProvider()
batch_processor = BatchSpanProcessor(
    span_exporter=GRPCSpanExporter(endpoint="http://custom-endpoint.com:6789")
)
tracer_provider.add_span_processor(batch_processor)

Passing TracerProvider kwargs

Both register() and TracerProvider accept all the same keyword arguments as the standard OpenTelemetry TracerProvider, allowing you to configure advanced features like custom ID generators, sampling, and span limits.

from opentelemetry.sdk.extension.aws.trace import AwsXRayIdGenerator
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
from phoenix.otel import register, TracerProvider

# Configure directly with register()
tracer_provider = register(
    project_name="my-app",
    id_generator=AwsXRayIdGenerator(),  # AWS X-Ray compatible IDs
    sampler=TraceIdRatioBased(0.1),     # Sample 10% of traces
)

# Or configure TracerProvider directly
tracer_provider = TracerProvider(
    project_name="my-app",
    id_generator=AwsXRayIdGenerator(),
    sampler=TraceIdRatioBased(0.5)
)

TypeScript

Overview

This package provides a TypeSript client for the Arize Phoenix API.

Installation

Configuration

The client will automatically read environment variables from your environment, if available.

The following environment variables are used:

PHOENIX_HOST - The base URL of the Phoenix API.
PHOENIX_API_KEY - The API key to use for authentication.
PHOENIX_CLIENT_HEADERS - Custom headers to add to all requests. A JSON stringified object.

Alternatively, you can pass configuration options to the client directly, and they will be prioritized over environment variables and default values.

Prompts

@arizeai/phoenix-client provides a prompts export that exposes utilities for working with prompts for LLMs.

Creating a Prompt and push it to Phoenix

The createPrompt function can be used to create a prompt in Phoenix for version control and reuse.

Prompts that are pushed to Phoenix are versioned and can be tagged.

Pulling a Prompt from Phoenix

The getPrompt function can be used to pull a prompt from Phoenix based on some Prompt Identifier and returns it in the Phoenix SDK Prompt type.

Using a Phoenix Prompt with an LLM Provider SDK

The toSDK helper function can be used to convert a Phoenix Prompt to the format expected by an LLM provider SDK. You can then use the LLM provider SDK as normal, with your prompt.

If your Prompt is saved in Phoenix as openai, you can use the toSDK function to convert the prompt to the format expected by OpenAI, or even Anthropic and Vercel AI SDK. We will do a best effort conversion to your LLM provider SDK of choice.

The following LLM provider SDKs are supported:

Vercel AI SDK: ai
OpenAI: openai
Anthropic: anthropic

REST Endpoints

The client provides a REST API for all endpoints defined in the .

Endpoints are accessible via strongly-typed string literals and TypeScript auto-completion inside of the client object.

A comprehensive overview of the available endpoints and their parameters is available in the OpenAPI viewer within Phoenix, or in the .

Examples

To run examples, install dependencies using pnpm and run:

Compatibility

This package utilizes to generate the types from the Phoenix OpenAPI spec.

Because of this, this package only works with the arize-phonix server 8.0.0 and above.

Compatibility Table:

Phoenix Client Version

Phoenix Server Version

TypeScript Packages

@arizeai/phoenix-mcp

MCP server implementation for Arize Phoenix providing unified interface to Phoenix's capabilities.

Phoenix MCP Server is an implementation of the Model Context Protocol for the Arize Phoenix platform. It provides a unified interface to Phoenix's capabilites.

You can use Phoenix MCP Server for:

Prompts Management: Create, list, update, and iterate on prompts
Datasets: Explore datasets, and syntesize new examples
Experiments: Pull experiment results and visualize them with the help of an LLM

Don't see a use-case covered? @arizeai/phoenix-mcp is ! Issues and PRs welcome.

Installation

This MCP server can be used using npx and can be directly integrated with clients like Claude Desktop, Cursor, and more.

Development

Install

This package is managed via a pnpm workspace.

This only needs to be repeated if dependencies change or there is a change to the phoenix-client.

Building

To build the project:

Development Mode

To run in development mode:

Debugging

You can build and run the MCP inspector using the following:

Environment Variables

When developing, the server requires the following environment variables:

PHOENIX_API_KEY: Your Phoenix API key
PHOENIX_BASE_URL: The base URL for Phoenix

Make sure to set these in a .env file. See .env.example.

License

Apache 2.0

@arizeai/phoenix-client

TypeScript client for the Arize Phoenix API. This package is still under active development and is subject to change.

Installation

npm install @arizeai/phoenix-client

Configuration

The client automatically reads environment variables from your environment, if available.

Environment Variables:

PHOENIX_HOST - The base URL of the Phoenix API
PHOENIX_API_KEY - The API key to use for authentication
PHOENIX_CLIENT_HEADERS - Custom headers to add to all requests (JSON stringified object)

PHOENIX_HOST='http://localhost:6006' PHOENIX_API_KEY='xxxxxx' npx tsx examples/list_datasets.ts
# emits the following request:
# GET http://localhost:6006/v1/datasets
# headers: {
#   "Authorization": "Bearer xxxxxx",
# }

You can also pass configuration options directly to the client, which take priority over environment variables:

const phoenix = createClient({
  options: {
    baseUrl: "http://localhost:6006",
    headers: {
      Authorization: "Bearer xxxxxx",
    },
  },
});

Prompts

The prompts export provides utilities for working with prompts for LLMs, including version control and reuse.

Creating a Prompt

Use createPrompt to create a prompt in Phoenix for version control and reuse:

import { createPrompt, promptVersion } from "@arizeai/phoenix-client/prompts";

const version = createPrompt({
  name: "my-prompt",
  description: "test-description",
  version: promptVersion({
    description: "version description here",
    modelProvider: "OPENAI",
    modelName: "gpt-3.5-turbo",
    template: [
      {
        role: "user",
        content: "{{ question }}",
      },
    ],
    invocationParameters: {
      temperature: 0.8,
    },
  }),
});

Prompts pushed to Phoenix are versioned and can be tagged.

Retrieving a Prompt

Use getPrompt to pull a prompt from Phoenix:

import { getPrompt } from "@arizeai/phoenix-client/prompts";

const prompt = await getPrompt({ name: "my-prompt" });
// Returns a strongly-typed prompt object

const promptByTag = await getPrompt({ tag: "production", name: "my-prompt" });
// Filter by tag

const promptByVersionId = await getPrompt({
  versionId: "1234567890",
});
// Filter by prompt version ID

Using Prompts with LLM Provider SDKs

The toSDK helper converts a Phoenix Prompt to the format expected by LLM provider SDKs.

Supported SDKs:

Vercel AI SDK: ai
OpenAI: openai
Anthropic: anthropic

import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
import { getPrompt, toSDK } from "@arizeai/phoenix-client/prompts";

const prompt = await getPrompt({ name: "my-prompt" });
const promptAsAI = toSDK({
  sdk: "ai",
  variables: {
    "my-variable": "my-value",
  },
  prompt,
});

const response = await generateText({
  model: openai(prompt.model_name),
  ...promptAsAI,
});

REST Endpoints

The client provides a REST API for all endpoints defined in the Phoenix OpenAPI spec.

Endpoints are accessible via strongly-typed string literals with TypeScript auto-completion:

import { createClient } from "@arizeai/phoenix-client";

const phoenix = createClient();

// Get all datasets
const datasets = await phoenix.GET("/v1/datasets");

// Get specific prompt
const prompt = await phoenix.GET("/v1/prompts/{prompt_identifier}/latest", {
  params: {
    path: {
      prompt_identifier: "my-prompt",
    },
  },
});

Datasets

Create and manage datasets, which are collections of examples used for experiments and evaluation.

Creating a Dataset

import { createDataset } from "@arizeai/phoenix-client/datasets";

const { datasetId } = await createDataset({
  name: "questions",
  description: "a simple dataset of questions",
  examples: [
    {
      input: { question: "What is the capital of France" },
      output: { answer: "Paris" },
      metadata: {},
    },
    {
      input: { question: "What is the capital of the USA" },
      output: { answer: "Washington D.C." },
      metadata: {},
    },
  ],
});

Experiments

Run and evaluate tasks on datasets for benchmarking models, evaluating outputs, and tracking experiment results.

Running an Experiment

import { createDataset } from "@arizeai/phoenix-client/datasets";
import { asEvaluator, runExperiment } from "@arizeai/phoenix-client/experiments";

// 1. Create a dataset
const { datasetId } = await createDataset({
  name: "names-dataset",
  description: "a simple dataset of names",
  examples: [
    {
      input: { name: "John" },
      output: { text: "Hello, John!" },
      metadata: {},
    },
    {
      input: { name: "Jane" },
      output: { text: "Hello, Jane!" },
      metadata: {},
    },
  ],
});

// 2. Define a task to run on each example
const task = async (example) => `hello ${example.input.name}`;

// 3. Define evaluators
const evaluators = [
  asEvaluator({
    name: "matches",
    kind: "CODE",
    evaluate: async ({ output, expected }) => {
      const matches = output === expected?.text;
      return {
        label: matches ? "matches" : "does not match",
        score: matches ? 1 : 0,
        explanation: matches
          ? "output matches expected"
          : "output does not match expected",
        metadata: {},
      };
    },
  }),
  asEvaluator({
    name: "contains-hello",
    kind: "CODE",
    evaluate: async ({ output }) => {
      const matches = typeof output === "string" && output.includes("hello");
      return {
        label: matches ? "contains hello" : "does not contain hello",
        score: matches ? 1 : 0,
        explanation: matches
          ? "output contains hello"
          : "output does not contain hello",
        metadata: {},
      };
    },
  }),
];

// 4. Run the experiment
const experiment = await runExperiment({
  dataset: { datasetId },
  task,
  evaluators,
});

Note: Tasks and evaluators are instrumented using OpenTelemetry. You can view detailed traces of experiment runs and evaluations directly in the Phoenix UI for debugging and performance analysis.

Compatibility

This package utilizes openapi-ts to generate types from the Phoenix OpenAPI spec.

Compatibility Table:

Phoenix Client Version

Phoenix Server Version

^2.0.0

^9.0.0

^1.0.0

^8.0.0

Requirements: This package only works with arize-phoenix server 8.0.0 and above.

@arizeai/phoenix-evals

TypeScript evaluation library for LLM applications. This package is vendor agnostic and can be used independently of any framework or platform.

Note: This package is in alpha and subject to change.

Installation

npm install @arizeai/phoenix-evals

Usage

Creating Custom Classifiers

Create custom evaluators for tasks like hallucination detection, relevance scoring, or any binary/multi-class classification:

import { createClassifier } from "@arizeai/phoenix-evals/llm";
import { openai } from "@ai-sdk/openai";

const model = openai("gpt-4o-mini");

const promptTemplate = `
In this task, you will be presented with a query, a reference text and an answer. The answer is
generated to the question based on the reference text. The answer may contain false information. You
must use the reference text to determine if the answer to the question contains false information,
if the answer is a hallucination of facts. Your objective is to determine whether the answer text
contains factual information and is not a hallucination. A 'hallucination' refers to
an answer that is not based on the reference text or assumes information that is not available in
the reference text. Your response should be a single word: either "factual" or "hallucinated", and
it should not include any other text or characters.

    [BEGIN DATA]
    ************
    [Query]: {{input}}
    ************
    [Reference text]: {{reference}}
    ************
    [Answer]: {{output}}
    ************
    [END DATA]

Is the answer above factual or hallucinated based on the query and reference text?
`;

// Create the classifier
const evaluator = await createClassifier({
  model,
  choices: { factual: 1, hallucinated: 0 },
  promptTemplate: promptTemplate,
});

// Use the classifier
const result = await evaluator({
  output: "Arize is not open source.",
  input: "Is Arize Phoenix Open Source?",
  reference: "Arize Phoenix is a platform for building and deploying AI applications. It is open source.",
});

console.log(result);
// Output: { label: "hallucinated", score: 0 }

Pre-Built Evaluators

The library includes several pre-built evaluators for common evaluation tasks. These evaluators come with optimized prompts and can be used directly with any AI SDK model.

import { 
  createHallucinationEvaluator,
} from "@arizeai/phoenix-evals/llm";
import { openai } from "@ai-sdk/openai";
import { anthropic } from "@ai-sdk/anthropic";

const model = openai("gpt-4o-mini");
// or use any other AI SDK provider
// const model = anthropic("claude-3-haiku-20240307");

// Hallucination Detection
const hallucinationEvaluator = createHallucinationEvaluator({
  model,
});


// Use the evaluators
const result = await hallucinationEvaluator({
  input: "What is the capital of France?",
  context: "France is a country in Europe. Paris is its capital city.",
  output: "The capital of France is London.",
});

console.log(result);
// Output: { label: "hallucinated", score: 0, explanation: "..." }

Experimentation with Phoenix

This package works seamlessly with @arizeai/phoenix-client to enable experimentation workflows. You can create datasets, run experiments, and trace evaluation calls for analysis and debugging.

Running Experiments

npm install @arizeai/phoenix-client

import { createHallucinationEvaluator } from "@arizeai/phoenix-evals/llm";
import { openai } from "@ai-sdk/openai";
import { createDataset } from "@arizeai/phoenix-client/datasets";
import { asEvaluator, runExperiment } from "@arizeai/phoenix-client/experiments";

// Create your evaluator
const hallucinationEvaluator = createHallucinationEvaluator({
  model: openai("gpt-4o-mini"),
});

// Create a dataset for your experiment
const dataset = await createDataset({
  name: "hallucination-eval",
  description: "Evaluate the hallucination of the model",
  examples: [
    {
      input: {
        question: "Is Phoenix Open-Source?",
        context: "Phoenix is Open-Source.",
      },
    },
    // ... more examples
  ],
});

// Define your experimental task
const task = async (example) => {
  // Your AI system's response to the question
  return "Phoenix is not Open-Source";
};

// Create a custom evaluator to validate results
const hallucinationCheck = asEvaluator({
  name: "hallucination",
  kind: "LLM",
  evaluate: async ({ input, output }) => {
    // Use the hallucination evaluator from phoenix-evals
    const result = await hallucinationEvaluator({
      input: input.question,
      context: input.context, // Note: uses 'context' not 'reference'
      output: output,
    });
    
    return result; // Return the evaluation result
  },
});

// Run the experiment with automatic tracing
runExperiment({
  experimentName: "hallucination-eval",
  experimentDescription: "Evaluate the hallucination of the model",
  dataset: dataset,
  task,
  evaluators: [hallucinationCheck],
});

POST /v1/prompts HTTP/1.1 Host: Content-Type: application/json Accept: */* Content-Length: 847 { "prompt": { "name": "text", "description": "text", "source_prompt_id": "text" }, "version": { "description": "text", "model_provider": "OPENAI", "model_name": "text", "template": { "type": "text", "messages": [ { "role": "user", "content": "text" } ] }, "template_type": "STR", "template_format": "MUSTACHE", "invocation_parameters": { "type": "text", "openai": { "temperature": 1, "max_tokens": 1, "max_completion_tokens": 1, "frequency_penalty": 1, "presence_penalty": 1, "top_p": 1, "seed": 1, "reasoning_effort": "low" } }, "tools": { "type": "text", "tools": [ { "type": "text", "function": { "name": "text", "description": "text", "parameters": { "ANY_ADDITIONAL_PROPERTY": "anything" }, "strict": true } } ], "tool_choice": { "type": "text" }, "disable_parallel_tool_calls": true }, "response_format": { "type": "text", "json_schema": { "name": "text", "description": "text", "schema": { "ANY_ADDITIONAL_PROPERTY": "anything" }, "strict": true } } } }

{ "data": { "description": "text", "model_provider": "OPENAI", "model_name": "text", "template": { "type": "text", "messages": [ { "role": "user", "content": "text" } ] }, "template_type": "STR", "template_format": "MUSTACHE", "invocation_parameters": { "type": "text", "openai": { "temperature": 1, "max_tokens": 1, "max_completion_tokens": 1, "frequency_penalty": 1, "presence_penalty": 1, "top_p": 1, "seed": 1, "reasoning_effort": "low" } }, "tools": { "type": "text", "tools": [ { "type": "text", "function": { "name": "text", "description": "text", "parameters": { "ANY_ADDITIONAL_PROPERTY": "anything" }, "strict": true } } ], "tool_choice": { "type": "text" }, "disable_parallel_tool_calls": true }, "response_format": { "type": "text", "json_schema": { "name": "text", "description": "text", "schema": { "ANY_ADDITIONAL_PROPERTY": "anything" }, "strict": true } }, "id": "text" } }

{ "data": [ { "description": "text", "model_provider": "OPENAI", "model_name": "text", "template": { "type": "text", "messages": [ { "role": "user", "content": "text" } ] }, "template_type": "STR", "template_format": "MUSTACHE", "invocation_parameters": { "type": "text", "openai": { "temperature": 1, "max_tokens": 1, "max_completion_tokens": 1, "frequency_penalty": 1, "presence_penalty": 1, "top_p": 1, "seed": 1, "reasoning_effort": "low" } }, "tools": { "type": "text", "tools": [ { "type": "text", "function": { "name": "text", "description": "text", "parameters": { "ANY_ADDITIONAL_PROPERTY": "anything" }, "strict": true } } ], "tool_choice": { "type": "text" }, "disable_parallel_tool_calls": true }, "response_format": { "type": "text", "json_schema": { "name": "text", "description": "text", "schema": { "ANY_ADDITIONAL_PROPERTY": "anything" }, "strict": true } }, "id": "text" } ], "next_cursor": "text" }

arize-phoenix-otel

Provides a lightweight wrapper around OpenTelemetry primitives with Phoenix-aware defaults. Phoenix Otel also gives you acces to tracing decorators for common GenAI patterns.

These defaults are aware of environment variables you may have set to configure Phoenix:

PHOENIX_COLLECTOR_ENDPOINT
PHOENIX_PROJECT_NAME
PHOENIX_CLIENT_HEADERS
PHOENIX_API_KEY
PHOENIX_GRPC_PORT

Installation

Install via pip.

pip install -Uq arize-phoenix-otel

Examples

Quickstart

from phoenix.otel import register
tracer_provider = register()

This is all you need to get started using OTel with Phoenix! register defaults to sending spans to an endpoint at http://localhost using gRPC.

Phoenix Authentication

If the PHOENIX_API_KEY environment variable is set, register will automatically add anauthorization header to each span payload.

Configuring the collector endpoint

There are two ways to configure the collector endpoint:

Using environment variables
Using the endpoint keyword argument

Using environment variables

If you're setting the PHOENIX_COLLECTOR_ENDPOINT environment variable, register will automatically try to send spans to your Phoenix server using gRPC.

# export PHOENIX_COLLECTOR_ENDPOINT=https://your-phoenix.com:6006

from phoenix.otel import register
tracer_provider = register()

Specifying the endpoint directly

from phoenix.otel import register
tracer_provider = register(endpoint="http://localhost:6006/v1/traces")

from phoenix.otel import register
tracer_provider = register(endpoint="http://localhost:9999", protocol="grpc")

Additional configuration

register can be configured with different keyword arguments:

project_name: The Phoenix project name (or PHOENIX_PROJECT_NAME env. var)
headers: Headers to send along with each span payload (or PHOENIX_CLIENT_HEADERS env. var)
batch: Whether or not to process spans in batch

from phoenix.otel import register
tracer_provider = register(
    project_name="otel-test", headers={"Authorization": "Bearer TOKEN"}, batch=True
)

A drop-in replacement for OTel primitives

For more granular tracing configuration, these wrappers can be used as drop-in replacements for OTel primitives:

from opentelemetry import trace as trace_api
from phoenix.otel import HTTPSpanExporter, TracerProvider, SimpleSpanProcessor

tracer_provider = TracerProvider()
span_exporter = HTTPSpanExporter(endpoint="http://localhost:6006/v1/traces")
span_processor = SimpleSpanProcessor(span_exporter=span_exporter)
tracer_provider.add_span_processor(span_processor)
trace_api.set_tracer_provider(tracer_provider)

Using environment variables

# export PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006

from opentelemetry import trace as trace_api
from phoenix.otel import TracerProvider

tracer_provider = TracerProvider()
trace_api.set_tracer_provider(tracer_provider)

Specifying the endpoint directly

from opentelemetry import trace as trace_api
from phoenix.otel import TracerProvider

tracer_provider = TracerProvider(endpoint="http://localhost:4317")
trace_api.set_tracer_provider(tracer_provider)

Further examples

Users can gradually add OTel components as desired:

Configuring resources

# export PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006

from opentelemetry import trace as trace_api
from phoenix.otel import Resource, PROJECT_NAME, TracerProvider

tracer_provider = TracerProvider(resource=Resource({PROJECT_NAME: "my-project"}))
trace_api.set_tracer_provider(tracer_provider)

Using a BatchSpanProcessor

# export PHOENIX_COLLECTOR_ENDPOINT=http://localhost:6006

from opentelemetry import trace as trace_api
from phoenix.otel import TracerProvider, BatchSpanProcessor

tracer_provider = TracerProvider()
batch_processor = BatchSpanProcessor()
tracer_provider.add_span_processor(batch_processor)

Specifying a custom GRPC endpoint

from opentelemetry import trace as trace_api
from phoenix.otel import TracerProvider, BatchSpanProcessor, GRPCSpanExporter

tracer_provider = TracerProvider()
batch_processor = BatchSpanProcessor(
    span_exporter=GRPCSpanExporter(endpoint="http://custom-endpoint.com:6789")
)
tracer_provider.add_span_processor(batch_processor)

Passing TracerProvider kwargs

from opentelemetry.sdk.extension.aws.trace import AwsXRayIdGenerator
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
from phoenix.otel import register, TracerProvider

# Configure directly with register()
tracer_provider = register(
    project_name="my-app",
    id_generator=AwsXRayIdGenerator(),  # AWS X-Ray compatible IDs
    sampler=TraceIdRatioBased(0.1),     # Sample 10% of traces
)

# Or configure TracerProvider directly
tracer_provider = TracerProvider(
    project_name="my-app",
    id_generator=AwsXRayIdGenerator(),
    sampler=TraceIdRatioBased(0.5)
)