1 of 88

Integrations

Overview

Request or Contribute an Integration

Don't see an integration you were looking for? We'd love to

Integration Types

Phoenix has a wide range of integrations. Generally these fall into a few categories:

Tracing integrations - where Phoenix will capture traces of applications built using a specific library.
1. E.g. , , , ,
Eval Model integrations - where Phoenix's eval Python package will make calls to a specific model.
1. E.g. , , ,
Eval Library integrations - where Phoenix traces can be evaluated using an outside eval library, instead of Phoenix's eval library, and visualized in Phoenix.
1. E.g. ,

Each partner listing in this section contains integration docs and relevant tutorials.

LLM Providers

Amazon Bedrock

Amazon Bedrock is a managed service that provides access to top AI models for building scalable applications.

Website:

Featured Tutorials

Amazon Bedrock Tracing

Instrument LLM calls to AWS Bedrock via the boto3 client using the BedrockInstrumentor

boto3 provides Python bindings to AWS services, including Bedrock, which provides access to a number of foundation models. Calls to these models can be instrumented using OpenInference, enabling OpenTelemetry-compliant observability of applications built using these models. Traces collected using OpenInference can be viewed in Phoenix.

OpenInference Traces collect telemetry data about the execution of your LLM application. Consider using this instrumentation to understand how a Bedrock-managed models are being called inside a complex system and to troubleshoot issues such as extraction and response synthesis.

Launch Phoenix

Install

pip install openinference-instrumentation-bedrock opentelemetry-exporter-otlp

Setup

Connect to your Phoenix instance using the register function.

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

After connecting to your Phoenix server, instrument boto3 prior to initializing a bedrock-runtime client. All clients created after instrumentation will send traces on all calls to invoke_model.

import boto3

session = boto3.session.Session()
client = session.client("bedrock-runtime")

Run Bedrock

From here you can run Bedrock as normal

prompt = (
    b'{"prompt": "Human: Hello there, how are you? Assistant:", "max_tokens_to_sample": 1024}'
)
response = client.invoke_model(modelId="anthropic.claude-v2", body=prompt)
response_body = json.loads(response.get("body").read())
print(response_body["completion"])

Observe

Now that you have tracing setup, all calls to invoke_model will be streamed to your running Phoenix for observability and evaluation.

Resources

Amazon Bedrock Evals

Configure and run Bedrock for evals

BedrockModel

class BedrockModel:
    model_id: str = "anthropic.claude-v2"
    """The model name to use."""
    temperature: float = 0.0
    """What sampling temperature to use."""
    max_tokens: int = 256
    """The maximum number of tokens to generate in the completion."""
    top_p: float = 1
    """Total probability mass of tokens to consider at each step."""
    top_k: int = 256
    """The cutoff where the model no longer selects the words"""
    stop_sequences: List[str] = field(default_factory=list)
    """If the model encounters a stop sequence, it stops generating further tokens. """
    session: Any = None
    """A bedrock session. If provided, a new bedrock client will be created using this session."""
    client = None
    """The bedrock session client. If unset, a new one is created with boto3."""
    max_content_size: Optional[int] = None
    """If you're using a fine-tuned model, set this to the maximum content size"""
    extra_parameters: Dict[str, Any] = field(default_factory=dict)
    """Any extra parameters to add to the request body (e.g., countPenalty for a21 models)"""

To Authenticate, the following code is used to instantiate a session and the session is used with Phoenix Evals

import boto3

# Create a Boto3 session
session = boto3.session.Session(
    aws_access_key_id='ACCESS_KEY',
    aws_secret_access_key='SECRET_KEY',
    region_name='us-east-1'  # change to your preferred AWS region
)

#If you need to assume a role
# Creating an STS client
sts_client = session.client('sts')

# (optional - if needed) Assuming a role
response = sts_client.assume_role(
    RoleArn="arn:aws:iam::......",
    RoleSessionName="AssumeRoleSession1",
    #(optional) if MFA Required
    SerialNumber='arn:aws:iam::...',
    #Insert current token, needs to be run within x seconds of generation
    TokenCode='PERIODIC_TOKEN'
)

# Your temporary credentials will be available in the response dictionary
temporary_credentials = response['Credentials']

# Creating a new Boto3 session with the temporary credentials
assumed_role_session = boto3.Session(
    aws_access_key_id=temporary_credentials['AccessKeyId'],
    aws_secret_access_key=temporary_credentials['SecretAccessKey'],
    aws_session_token=temporary_credentials['SessionToken'],
    region_name='us-east-1'
)

client_bedrock = assumed_role_session.client("bedrock-runtime")
# Arize Model Object - Bedrock ClaudV2 by default
model = BedrockModel(client=client_bedrock)
model("Hello there, how are you?")
# Output: "As an artificial intelligence, I don't have feelings, 
#          but I'm here and ready to assist you. How can I help you today?"

Amazon Bedrock Agents Tracing

Instrument LLM calls to AWS Bedrock via the boto3 client using the BedrockInstrumentor

Amazon Bedrock Agents allow you to easily define, deploy, and manage agents on your AWS infrastructure. Traces on invocations of these agents can be captured using OpenInference and viewed in Phoenix.

This instrumentation will capture data on LLM calls, action group invocations (as tools), knowledgebase lookups, and more.

Launch Phoenix

Install

pip install openinference-instrumentation-bedrock

Setup

Connect to your Phoenix instance using the register function.

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

import boto3

session = boto3.session.Session()
client = session.client("bedrock-runtime")

Run Bedrock Agents

From here you can run Bedrock as normal

session_id = f"default-session1_{int(time.time())}"

attributes = dict(
    inputText=input_text,
    agentId=AGENT_ID,
    agentAliasId=AGENT_ALIAS_ID,
    sessionId=session_id,
    enableTrace=True,
)
response = client.invoke_agent(**attributes)

Observe

Now that you have tracing setup, all calls will be streamed to your running Phoenix for observability and evaluation.

Resources

Anthropic

Anthropic is an AI research company that develops LLMs, including Claude, with a focus on alignment and reliable behavior.

Website:

Featured Tutorials

Anthropic Tracing

Anthropic is a leading provider for state-of-the-art LLMs. The Anthropic SDK can be instrumented using the openinference-instrumentation-anthropic package.

Install

pip install openinference-instrumentation-anthropic anthropic

Setup

Use the register function to connect your application to Phoenix:

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Run Anthropic

A simple Anthropic application that is now instrumented

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=1000,
    temperature=0,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Why is the ocean salty?"
                }
            ]
        }
    ]
)
print(message.content)

Observe

Now that you have tracing setup, all invocations of pipelines will be streamed to your running Phoenix for observability and evaluation.

Resources:

Anthropic Evals

Configure and run Anthropic for evals

AnthropicModel

class AnthropicModel(BaseModel):
    model: str = "claude-2.1"
    """The model name to use."""
    temperature: float = 0.0
    """What sampling temperature to use."""
    max_tokens: int = 256
    """The maximum number of tokens to generate in the completion."""
    top_p: float = 1
    """Total probability mass of tokens to consider at each step."""
    top_k: int = 256
    """The cutoff where the model no longer selects the words."""
    stop_sequences: List[str] = field(default_factory=list)
    """If the model encounters a stop sequence, it stops generating further tokens."""
    extra_parameters: Dict[str, Any] = field(default_factory=dict)
    """Any extra parameters to add to the request body (e.g., countPenalty for a21 models)"""
    max_content_size: Optional[int] = None
    """If you're using a fine-tuned model, set this to the maximum content size"""

Usage

In this section, we will showcase the methods and properties that our EvalModels have. First, instantiate your model from the. Once you've instantiated your model, you can get responses from the LLM by simply calling the model and passing a text string.

model = #Instantiate your Anthropic model here
model("Hello there, how are you?")
# Output: "As an artificial intelligence, I don't have feelings, 
#          but I'm here and ready to assist you. How can I help you today?"

Google

Google GenAI is a suite of AI tools and models from Google Cloud, designed to help businesses build, deploy, and scale AI applications.

Website: https://cloud.google.com/docs/generative-ai

Google Gen AI Tracing

Instrument LLM calls made using the Google Gen AI Python SDK

Launch Phoenix

Install

pip install openinference-instrumentation-google-genai google-genai

Setup

Set the GEMINI_API_KEY environment variable. To use the Gen AI SDK with Vertex AI instead of the Developer API, refer to Google's guide on setting the required environment variables.

export GEMINI_API_KEY=[your_key_here]

Use the register function to connect your application to Phoenix.

from phoenix.otel import register

# Configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Observe

Now that you have tracing setup, all Gen AI SDK requests will be streamed to Phoenix for observability and evaluation.

import os
from google import genai

def send_message_multi_turn() -> tuple[str, str]:
    client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
    chat = client.chats.create(model="gemini-2.0-flash-001")
    response1 = chat.send_message("What is the capital of France?")
    response2 = chat.send_message("Why is the sky blue?")

    return response1.text or "", response2.text or ""

This instrumentation will support tool calling soon. Refer to this page for the status.

Gemini Evals

Configure and run Gemini for evals

GeminiModel

To authenticate with Gemini, you must pass either your credentials or a project, location pair. In the following example, we quickly instantiate the Gemini model as follows:

Groq

Groq provides ultra-low latency inference for LLMs through its custom-built LPU™ architecture.

Website:

Featured Tutorials

Groq Tracing

Instrument LLM applications built with Groq

provides low latency and lightning-fast inference for AI models. Arize supports instrumenting Groq API calls, including role types such as system, user, and assistant messages, as well as tool use. You can create a free GroqCloud account and to get started.

Launch Phoenix

Install

Setup

Connect to your Phoenix instance using the register function.

Run Groq

A simple Groq application that is now instrumented

Observe

Now that you have tracing setup, all invocations of pipelines will be streamed to your running Phoenix for observability and evaluation.

Resources:

LiteLLM

LiteLLM is an open-source platform that provides a unified interface to manage and access over 100 LLMs from various providers.

Website: https://www.litellm.ai/

LiteLLM Tracing

allows developers to call all LLM APIs using the openAI format. is a proxy server to call 100+ LLMs in OpenAI format. Both are supported by this auto-instrumentation.

Any calls made to the following functions will be automatically captured by this integration:

completion()
acompletion()
completion_with_retries()
embedding()
aembedding()
image_generation()
aimage_generation()

Launch Phoenix

Install

Setup

Use the register function to connect your application to Phoenix:

Add any API keys needed by the models you are using with LiteLLM.

Run LiteLLM

You can now use LiteLLM as normal and calls will be traces in Phoenix.

Observe

Traces should now be visible in Phoenix!

Resources

LiteLLM Evals

Configure and run LiteLLM for evals

Need to install the extra dependency litellm>=1.0.3

class LiteLLMModel(BaseEvalModel):
    model: str = "gpt-3.5-turbo"
    """The model name to use."""
    temperature: float = 0.0
    """What sampling temperature to use."""
    max_tokens: int = 256
    """The maximum number of tokens to generate in the completion."""
    top_p: float = 1
    """Total probability mass of tokens to consider at each step."""
    num_retries: int = 6
    """Maximum number to retry a model if an RateLimitError, OpenAIError, or
    ServiceUnavailableError occurs."""
    request_timeout: int = 60
    """Maximum number of seconds to wait when retrying."""
    model_kwargs: Dict[str, Any] = field(default_factory=dict)
    """Model specific params"""

You can choose among multiple models supported by LiteLLM. Make sure you have set the right environment variables set prior to initializing the model. For additional information about the environment variables for specific model providers visit: LiteLLM provider specific params

Here is an example of how to initialize LiteLLMModel for llama3 using ollama.

import os

from phoenix.evals import LiteLLMModel

os.environ["OLLAMA_API_BASE"] = "http://localhost:11434"

model = LiteLLMModel(model="ollama/llama3")

MistralAI

Mistral AI develops open-weight large language models, focusing on efficiency, customization, and cost-effective AI solutions.

Website:

MistralAI Tracing

Instrument LLM calls made using MistralAI's SDK via the MistralAIInstrumentor

MistralAI is a leading provider for state-of-the-art LLMs. The MistralAI SDK can be instrumented using the openinference-instrumentation-mistralai package.

Launch Phoenix

Install

pip install openinference-instrumentation-mistralai mistralai

Setup

Set the MISTRAL_API_KEY environment variable to authenticate calls made using the SDK.

export MISTRAL_API_KEY=[your_key_here]

Connect to your Phoenix instance using the register function.

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Run Mistral

import os

from mistralai import Mistral
from mistralai.models import UserMessage

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-tiny"

client = Mistral(api_key=api_key)

chat_response = client.chat.complete(
    model=model,
    messages=[UserMessage(content="What is the best French cheese?")],
)
print(chat_response.choices[0].message.content)

Observe

Now that you have tracing setup, all invocations of Mistral (completions, chat completions, embeddings) will be streamed to your running Phoenix for observability and evaluation.

Resources

MistralAI Evals

Configure and run MistralAI for evals

MistralAIModel

Need to install extra dependency mistralai

class MistralAIModel(BaseModel):
    model: str = "mistral-large-latest"
    temperature: float = 0
    top_p: Optional[float] = None
    random_seed: Optional[int] = None
    response_format: Optional[Dict[str, str]] = None
    safe_mode: bool = False
    safe_prompt: bool = False

Usag

# model = Instantiate your MistralAIModel here
model("Hello there, how are you?")
# Output: "As an artificial intelligence, I don't have feelings, 
#          but I'm here and ready to assist you. How can I help you today?"

OpenAI

OpenAI provides state-of-the-art LLMs for natural language understanding and generation.

Website: https://openai.com/

Featured Tutorials

OpenAI Tracing

Phoenix provides auto-instrumentation for the .

Note: This instrumentation also works with Azure OpenAI

Launch Phoenix

We have several code samples below on different ways to integrate with OpenAI, based on how you want to use Phoenix.

Install

Setup

Add your OpenAI API key as an environment variable:

Use the register function to connect your application to Phoenix:

Run OpenAI

Observe

Now that you have tracing setup, all invocations of OpenAI (completions, chat completions, embeddings) will be streamed to your running Phoenix for observability and evaluation.

Resources

VertexAI

Vertex AI is a fully managed platform by Google Cloud for building, deploying, and scaling machine learning models.

Website: https://cloud.google.com/vertex-ai

VertexAI Tracing

Instrument LLM calls made using VertexAI's SDK via the VertexAIInstrumentor

The VertexAI SDK can be instrumented using the openinference-instrumentation-vertexai package.

Launch Phoenix

Install

pip install openinference-instrumentation-vertexai vertexai

Setup

See Google's guide on setting up your environment for the Google Cloud AI Platform. You can also store your Project ID in the CLOUD_ML_PROJECT_ID environment variable.

Use the register function to connect your application to Phoenix:

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Run VertexAI

import vertexai
from vertexai.generative_models import GenerativeModel

vertexai.init(location="us-central1")
model = GenerativeModel("gemini-1.5-flash")

print(model.generate_content("Why is sky blue?").text)

Observe

Now that you have tracing setup, all invocations of Vertex models will be streamed to your running Phoenix for observability and evaluation.

Resources

VertexAI Evals

Configure and run VertexAI for evals

VertexAI

Need to install the extra dependencygoogle-cloud-aiplatform>=1.33.0

class VertexAIModel:
    project: Optional[str] = None
    location: Optional[str] = None
    credentials: Optional["Credentials"] = None
    model: str = "text-bison"
    tuned_model: Optional[str] = None
    temperature: float = 0.0
    max_tokens: int = 256
    top_p: float = 0.95
    top_k: int = 40

To authenticate with VertexAI, you must pass either your credentials or a project, location pair. In the following example, we quickly instantiate the VertexAI model as follows:

project = "my-project-id"
location = "us-central1" # as an example
model = VertexAIModel(project=project, location=location)
model("Hello there, this is a tesst if you are working?")
# Output: "Hello world, I am working!"

Frameworks

Agno

Agno is an open-source Python framework for building lightweight, model-agnostic AI agents with built-in memory, knowledge, tools, and reasoning capabilities

Website:

AutoGen

AutoGen is an open-source Python framework for orchestrating multi-agent LLM interactions with shared memory and tool integrations to build scalable AI workflows

Website: https://microsoft.github.io/autogen/stable/

Featured Tutorials

AutoGen Tracing

AutoGen is an agent framework from Microsoft that allows for complex Agent creation. It is unique in its ability to create multiple agents that work together.

The AutoGen Agent framework allows creation of multiple agents and connection of those agents to work together to accomplish tasks.

Launch Phoenix

Install

Phoenix instruments Autogen by instrumenting the underlying model library it's using. If your agents are set up to call OpenAI, use our OpenAI instrumentor per the example below.

If your agents are using a different model, be sure to instrument that model instead by installing its respective OpenInference library.

pip install openinference-instrumentation-openai arize-phoenix-otel arize-phoenix

Setup

Connect to your Phoenix instance using the register function.

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Run Autogen

From here you can use Autogen as normal, and Phoenix will automatically trace any model calls made.

Observe

The Phoenix support is simple in its first incarnation but allows for capturing all of the prompt and responses that occur under the framework between each agent.

The individual prompt and responses are captured directly through OpenAI calls. If you're using a different underlying model provider than OpenAI, instrument your application using the respective instrumentor instead.

Resources:

Example notebook

BeeAI

BeeAI is an open-source platform that enables developers to discover, run, and compose AI agents from any framework, facilitating the creation of interoperable multi-agent systems

Website:

BeeAI Tracing (Python)

Instrument and observe BeeAI agents

Phoenix provides seamless observability and tracing for BeeAI agents through the .

Launch Phoenix

Install

Setup

Connect to your Phoenix instance using the register function.

Run BeeAI

Sample agent built using BeeAI with automatic tracing:

Observe

Phoenix provides visibility into your BeeAI agent operations by automatically tracing all interactions.

Resources

CrewAI

CrewAI is an open-source Python framework for orchestrating role-playing, autonomous AI agents into collaborative “crews” and “flows,” combining high-level simplicity with fine-grained control.

Featured Tutorials

Dify

Dify lets you visually build, orchestrate, and deploy AI-native apps using LLMs, with low-code workflows and agent frameworks for fast deployment.

Website:

Dify Tracing

Configure your Dify application to view traces in Phoenix

Launch Phoenix

The fastest way to get started with Phoenix is by signing up for a free Phoenix Cloud account. If you prefer, you can also run Phoenix in a , , or use it directly from your .

Get your API Key and Endpoint

Go to the settings page in your Phoenix instance to find your endpoint and API key.

Connect Dify and Phoenix

To configure Phoenix tracing in your Dify application:

Open the Dify application you want to monitor.
In the left sidebar, navigate to Monitoring.
On the Monitoring page, select Phoenix in the Tracing drop down to begin setup.
Enter your Phoenix credentials and save. You can verify the monitoring status on the current page.

Observe

View Dify traces in Phoenix. Get rich details into tool calls, session data, workflow steps, and more.

Resources

Learn more details about the tracing data captured in the Dify documentation

DSPy

DSPy is an open-source Python framework for declaratively programming modular LLM pipelines and automatically optimizing prompts and model weights

Website: https://dspy.ai/

Featured Tutorials

Flowise

Flowise is a low-code platform for building customized chatflows and agentflows.

Website:

Flowise Tracing

Analyzing and troubleshooting what happens under the hood can be challenging without proper insights. By integrating your Flowise application with Phoenix, you can monitor traces and gain robust observability into your chatflows and agentflows.

Viewing Flowise traces in Phoenix

Access Configurations: Navigate to settings in your chatflow or agentflow and find configurations.

Connect to Phoenix: Go to the Analyze Chatflow tab and configure your application with Phoenix. Get your API key from your Phoenix instance to create your credentials. Be sure to name your project and confirm that the Phoenix toggle is enabled before saving.
Note: If you are using using an environment that is not Phoenix Cloud, you may need to modify the Endpoint field.

View Traces: In Phoenix, you will find your project under the Projects tab. Click into this to view and analyze traces as you test your application.

Store and Experiment: Optionally, you can also filter traces, store traces in a dataset to run experiments, analyze patterns, and optimize your workflows over time.

You can also reference Flowise documentation here.

Guardrails AI

Guardrails is an open-source Python framework for adding programmable input/output validators to LLM applications, ensuring safe, structured, and compliant model interactions

Featured Tutorials

Guardrails AI Tracing

Instrument LLM applications that use the Guardrails AI framework

In this example we will instrument a small program that uses the Guardrails AI framework to protect their LLM calls.

Launch Phoenix

Install

pip install openinference-instrumentation-guardrails guardrails-ai

Setup

Connect to your Phoenix instance using the register function.

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Run Guardrails

From here, you can run Guardrails as normal:

from guardrails import Guard
from guardrails.hub import TwoWords
import openai

guard = Guard().use(
    TwoWords(),
)
response = guard(
    llm_api=openai.chat.completions.create,
    prompt="What is another name for America?",
    model="gpt-3.5-turbo",
    max_tokens=1024,
)

print(response)

Observe

Now that you have tracing setup, all invocations of underlying models used by Guardrails (completions, chat completions, embeddings) will be streamed to your running Phoenix for observability and evaluation. Additionally, Guards will be present as a new span kind in Phoenix.

Resources

Haystack

Haystack is an open-source framework for building scalable semantic search and QA pipelines with document indexing, retrieval, and reader components

Featured Tutorials

Hugging Face smolagents

Hugging Face smolagents is a minimalist Python library for building powerful AI agents with simple abstractions, tool integrations, and flexible LLM support

Featured Tutorials

smolagents Tracing

How to use the SmolagentsInstrumentor to trace smolagents by Hugging Face

smolagents is a minimalist AI agent framework developed by Hugging Face, designed to simplify the creation and deployment of powerful agents with just a few lines of code. It focuses on simplicity and efficiency, making it easy for developers to leverage large language models (LLMs) for various applications.

Phoenix provides auto-instrumentation, allowing you to track and visualize every step and call made by your agent.

Launch Phoenix

We have several code samples below on different ways to integrate with smolagents, based on how you want to use Phoenix.

Install

Setup

Add your HF_TOKEN as an environment variable:

Connect to your Phoenix instance using the register function.

Create & Run an Agent

Create your Hugging Face Model, and at every run, traces will be sent to Phoenix.

Observe

Now that you have tracing setup, all invocations and steps of your Agent will be streamed to your running Phoenix for observability and evaluation.

Resources

Instructor Tracing

Launch Phoenix

Install

pip install openinference-instrumentation-instructor instructor

Be sure you also install the OpenInference library for the underlying model you're using along with Instructor. For example, if you're using OpenAI calls directly, you would also add: openinference-instrumentation-openai

Setup

Connect to your Phoenix instance using the register function.

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Run Instructor

From here you can use instructor as normal.

import instructor
from pydantic import BaseModel
from openai import OpenAI


# Define your desired output structure
class UserInfo(BaseModel):
    name: str
    age: int


# Patch the OpenAI client
client = instructor.from_openai(OpenAI())

# Extract structured data from natural language
user_info = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserInfo,
    messages=[{"role": "user", "content": "John Doe is 30 years old."}],
)

print(user_info.name)
#> John Doe
print(user_info.age)
#> 30

Observe

Now that you have tracing setup, all invocations of your underlying model (completions, chat completions, embeddings) and instructor triggers will be streamed to your running Phoenix for observability and evaluation.

LlamaIndex Workflows Tracing

How to use the python LlamaIndexInstrumentor to trace LlamaIndex Workflows

are a subset of the LlamaIndex package specifically designed to support agent development.

Our LlamaIndexInstrumentor automatically captures traces for LlamaIndex Workflows agents. If you've already enabled that instrumentor, you do not need to complete the steps below.

We recommend using llama_index >= 0.11.0

Launch Phoenix

Install

Setup

Initialize the LlamaIndexInstrumentor before your application code. This instrumentor will trace both LlamaIndex Workflows calls, as well as calls to the general LlamaIndex package.

Run LlamaIndex Workflows

By instrumenting LlamaIndex, spans will be created whenever an agent is invoked and will be sent to the Phoenix server for collection.

Observe

Now that you have tracing setup, all invocations of chains will be streamed to your running Phoenix for observability and evaluation.

Resources

LangChain

LangChain is an open-source framework for building language model applications with prompt chaining, memory, and external integrations

Website: https://www.langchain.com/

Featured Tutorials

LangChain Tracing

How to use the python LangChainInstrumentor to trace LangChain

Phoenix has first-class support for applications.

Launch Phoenix

Install

Setup

Use the register function to connect your application to Phoenix:

Run LangChain

By instrumenting LangChain, spans will be created whenever a chain is run and will be sent to the Phoenix server for collection.

Observe

Now that you have tracing setup, all invocations of chains will be streamed to your running Phoenix for observability and evaluation.

Resources

LangGraph

LangGraph is an open-source framework for building graph-based LLM pipelines with modular nodes and seamless data integrations

Website: https://www.langchain.com/langgraph

Featured Tutorials

LangGraph Tracing

Phoenix has first-class support for LangGraph applications.

LangGraph is supported by our LangChain instrumentor. If you've already set up instrumentation with LangChain, you don't need to complete the set up below

Launch Phoenix

Install

pip install openinference-instrumentation-langchain

Install the OpenInference Langchain library before your application code. Our LangChainInstrumentor works for both standard LangChain applications and for LangGraph agents.

Setup

Use the register function to connect your application to Phoenix:

from phoenix.otel import register

# configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Run LangGraph

By instrumenting LangGraph, spans will be created whenever an agent is invoked and will be sent to the Phoenix server for collection.

Observe

Now that you have tracing setup, all invocations of chains will be streamed to your running Phoenix for observability and evaluation.

Resources

LangFlow

Langflow is an open-source visual framework that enables developers to rapidly design, prototype, and deploy custom applications powered by large language models (LLMs)

Website:

LangFlow Tracing

Pull Langflow Repo

Navigate to the Langflow GitHub repo and pull the project down

Create .env file

Navigate to the repo and create a .env file with all the Arize Phoenix variables.

You can use the .env.example as a template to create the .env file

Add the following environment variable to the .env file

Note: This Langflow integration is for

Start Docker Desktop

Start Docker Desktop, build the images, and run the container (this will take around 10 minutes the first time) Go into your terminal into the Langflow directory and run the following commands

Go to Hosted Langflow UI

Create a Flow

In this example, we'll use Simple Agent for this tutorial

Add your OpenAI Key to the Agent component in Langflow

Go into the Playground and run the Agent

Go to Arize Phoenix

Navigate to your project name (should match the name of of your Langflow Agent name)

Inspect Traces

AgentExecutor Trace is Arize Phoenix instrumentation to capture what's happening with the LangChain being ran during the Langflow components

The other UUID trace is the native Langflow tracing.

Mastra

Mastra is an open-source TypeScript AI agent framework designed for building production-ready AI applications with agents, workflows, RAG, and observability

Website:

Model Context Protocol

Anthropic's Model Context Protocol is a standard for connecting AI assistants to the systems where data lives, including content repositories, business tools, and development environments.

Website:

Portkey

Portkey is an AI Gateway and observability platform that provides routing, guardrails, caching, and monitoring for 200+ LLMs with enterprise-grade security and reliability features.

Website: https://portkey.ai/

Prompt Flow

PromptFlow is a framework for designing, orchestrating, testing, and monitoring end-to-end LLM prompt workflows with built-in versioning and analytics

Website:

Pydantic AI

PydanticAI is a Python agent framework designed to make it less painful to build production-grade applications with Generative AI, built by the team behind Pydantic with type-safe structured outputs

Vercel

Vercel is a cloud platform that simplifies building, deploying, and scaling modern web applications with features like serverless functions, edge caching, and seamless Git integration

Website:

Evaluation Integrations

Vector Databases

MongoDB

MongoDB is a database platform. Their Atlas product is built for GenAI applications.

Website: mongodb.com

Phoenix can be used to trace and evaluate applications that use MongoDB Atlas as a vector database.

Featured Tutorials

Qdrant

Qdrant is an open-source vector database built for high-dimensional vectors and large scale workflows

Website:

Phoenix can be used to trace and evaluate applications that use Qdrant as a vector database.

Examples

Zilliz / Milvus

Milvus is an open-source vector database built for GenAI applications.

Website: milvus.io

Phoenix can be used to trace and evaluate applications that use Zilliz or Milvus as a vector database.

Examples:

Pydantic AI Evals

How to use Pydantic Evals with Phoenix to evaluate AI applications using structured evaluation frameworks

Pydantic Evals is an evaluation library that provides preset direct evaluations and LLM Judge evaluations. It can be used to run evaluations over dataframes of cases defined with Pydantic models. This guide shows you how to use Pydantic Evals alongside Arize Phoenix to run evaluations on traces captured from your running application.

Launch Phoenix

Sign up for Phoenix:

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint and API Key:

import os

# Add Phoenix API Key for tracing
PHOENIX_API_KEY = "ADD YOUR API KEY"
os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={PHOENIX_API_KEY}"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"

Your Phoenix API key can be found on the Keys section of your dashboard.

Launch your local Phoenix instance:

pip install arize-phoenix
phoenix serve

For details on customizing a local terminal deployment, see Terminal Setup.

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint:

import os

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://localhost:6006"

See Terminal for more details

Pull latest Phoenix image from Docker Hub:

docker pull arizephoenix/phoenix:latest

Run your containerized instance:

docker run -p 6006:6006 arizephoenix/phoenix:latest

This will expose the Phoenix on localhost:6006

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint:

import os

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://localhost:6006"

For more info on using Phoenix with Docker, see Docker.

Install packages:

pip install arize-phoenix

Launch Phoenix:

import phoenix as px
px.launch_app()

By default, notebook instances do not have persistent storage, so your traces will disappear after the notebook is closed. See self-hosting or use one of the other deployment options to retain traces.

Install

pip install pydantic-evals arize-phoenix openai openinference-instrumentation-openai

Setup

Enable Phoenix tracing to capture traces from your application:

from phoenix.otel import register

tracer_provider = register(
    project_name="pydantic-evals-tutorial",
    auto_instrument=True,  # Automatically instrument OpenAI calls
)

Basic Usage

1. Generate Traces to Evaluate

First, create some example traces by running your AI application. Here's a simple example:

from openai import OpenAI
import os

client = OpenAI()

inputs = [
    "What is the capital of France?",
    "Who wrote Romeo and Juliet?", 
    "What is the largest planet in our solar system?",
]

def generate_trace(input):
    client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant. Only respond with the answer to the question as a single word or proper noun.",
            },
            {"role": "user", "content": input},
        ],
    )

for input in inputs:
    generate_trace(input)

2. Export Traces from Phoenix

Export the traces you want to evaluate:

import phoenix as px
from phoenix.trace.dsl import SpanQuery

query = SpanQuery().select(
    input="llm.input_messages",
    output="llm.output_messages",
)

# Query spans from Phoenix
spans = px.Client().query_spans(query, project_name="pydantic-evals-tutorial")
spans["input"] = spans["input"].apply(lambda x: x[1].get("message").get("content"))
spans["output"] = spans["output"].apply(lambda x: x[0].get("message").get("content"))

3. Define Evaluation Dataset

Create a dataset of test cases using Pydantic Evals:

from pydantic_evals import Case, Dataset

cases = [
    Case(
        name="capital of France", 
        inputs="What is the capital of France?", 
        expected_output="Paris"
    ),
    Case(
        name="author of Romeo and Juliet",
        inputs="Who wrote Romeo and Juliet?",
        expected_output="William Shakespeare",
    ),
    Case(
        name="largest planet",
        inputs="What is the largest planet in our solar system?",
        expected_output="Jupiter",
    ),
]

4. Create Custom Evaluators

Define evaluators to assess your model's performance:

from pydantic_evals.evaluators import Evaluator, EvaluatorContext

class MatchesExpectedOutput(Evaluator[str, str]):
    def evaluate(self, ctx: EvaluatorContext[str, str]) -> float:
        is_correct = ctx.expected_output == ctx.output
        return is_correct

class FuzzyMatchesOutput(Evaluator[str, str]):
    def evaluate(self, ctx: EvaluatorContext[str, str]) -> float:
        from difflib import SequenceMatcher
        
        def similarity_ratio(a, b):
            return SequenceMatcher(None, a, b).ratio()
        
        # Consider it correct if similarity is above 0.8 (80%)
        is_correct = similarity_ratio(ctx.expected_output, ctx.output) > 0.8
        return is_correct

5. Setup Task and Dataset

Create a task that retrieves outputs from your traced data:

import nest_asyncio
nest_asyncio.apply()

async def task(input: str) -> str:
    output = spans[spans["input"] == input]["output"].values[0]
    return output

# Create dataset with evaluators
dataset = Dataset(
    cases=cases,
    evaluators=[MatchesExpectedOutput(), FuzzyMatchesOutput()],
)

6. Add LLM Judge Evaluator

For more sophisticated evaluation, add an LLM judge:

from pydantic_evals.evaluators import LLMJudge

dataset.add_evaluator(
    LLMJudge(
        rubric="Output and Expected Output should represent the same answer, even if the text doesn't match exactly",
        include_input=True,
        model="openai:gpt-4o-mini",
    ),
)

7. Run Evaluation

Execute the evaluation:

report = dataset.evaluate_sync(task)
print(report)

Advanced Usage

Upload Results to Phoenix

Upload your evaluation results back to Phoenix for visualization:

from phoenix.trace import SpanEvaluations

# Extract results from the report
results = report.model_dump()

# Create dataframes for each evaluator
meo_spans = spans.copy()
fuzzy_label_spans = spans.copy()
llm_label_spans = spans.copy()

for case in results.get("cases"):
    # Extract evaluation results
    meo_label = case.get("assertions").get("MatchesExpectedOutput").get("value")
    fuzzy_label = case.get("assertions").get("FuzzyMatchesOutput").get("value")
    llm_label = case.get("assertions").get("LLMJudge").get("value")
    
    input = case.get("inputs")
    
    # Update labels in dataframes
    meo_spans.loc[meo_spans["input"] == input, "label"] = str(meo_label)
    fuzzy_label_spans.loc[fuzzy_label_spans["input"] == input, "label"] = str(fuzzy_label)
    llm_label_spans.loc[llm_label_spans["input"] == input, "label"] = str(llm_label)

# Add scores for Phoenix metrics
meo_spans["score"] = meo_spans["label"].apply(lambda x: 1 if x == "True" else 0)
fuzzy_label_spans["score"] = fuzzy_label_spans["label"].apply(lambda x: 1 if x == "True" else 0)
llm_label_spans["score"] = llm_label_spans["label"].apply(lambda x: 1 if x == "True" else 0)

# Upload to Phoenix
px.Client().log_evaluations(
    SpanEvaluations(
        dataframe=meo_spans,
        eval_name="Direct Match Eval",
    ),
    SpanEvaluations(
        dataframe=fuzzy_label_spans,
        eval_name="Fuzzy Match Eval",
    ),
    SpanEvaluations(
        dataframe=llm_label_spans,
        eval_name="LLM Match Eval",
    ),
)

Custom Evaluation Workflows

You can create more complex evaluation workflows by combining multiple evaluators:

from pydantic_evals.evaluators import Evaluator, EvaluatorContext
from typing import Dict, Any

class ComprehensiveEvaluator(Evaluator[str, str]):
    def evaluate(self, ctx: EvaluatorContext[str, str]) -> Dict[str, Any]:
        # Multiple evaluation criteria
        exact_match = ctx.expected_output == ctx.output
        
        # Length similarity
        length_ratio = min(len(ctx.output), len(ctx.expected_output)) / max(len(ctx.output), len(ctx.expected_output))
        
        # Semantic similarity (simplified)
        from difflib import SequenceMatcher
        semantic_score = SequenceMatcher(None, ctx.expected_output.lower(), ctx.output.lower()).ratio()
        
        return {
            "exact_match": exact_match,
            "length_similarity": length_ratio,
            "semantic_similarity": semantic_score,
            "overall_score": (exact_match * 0.5) + (semantic_score * 0.3) + (length_ratio * 0.2)
        }

Observe

Once you have evaluation results uploaded to Phoenix, you can:

View evaluation metrics: See overall performance across different evaluation criteria
Analyze individual cases: Drill down into specific examples that passed or failed
Compare evaluators: Understand how different evaluation methods perform
Track improvements: Monitor evaluation scores over time as you improve your application
Debug failures: Identify patterns in failed evaluations to guide improvements

The Phoenix UI will display your evaluation results with detailed breakdowns, making it easy to understand your AI application's performance and identify areas for improvement.