Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Available in Phoenix 10.12+
New Features:
Added an optional sessionId
argument to the Project.sessions
GraphQL field, enabling filtering by session_id
.
Integrated support across the backend resolver and frontend UI to seamlessly filter and display sessions matching a specific session_id
.
Available in Phoenix 8.26+
Phoenix now supports programmatic API key creation through a new endpoint, making it easier to automate project setup and trace logging. To enable this, set the PHOENIX_ADMIN_SECRET
environment variable in your deployment.
Tracing: Add load more and loading state to the infinite scroll
UI: Hide menu for changing role for self in UsersTable
Security: Prevent admins from changing their own roles
Infrastructure: Remove WebSocket dependency and migrate to Multipart Subscriptions
Available in Phoenix 8.15+
In the New Project tab, we've added quick setup to instrument your application for BeeAI, SmolAgents, and the OpenAI Agents SDK.
Easily configure all integrations with streamlined instructions. Check out all Phoenix tracing integrations here.
Available in Phoenix 8.0+
Phoenix has made it even simpler to get started with tracing by introducing one-line auto-instrumentation. By using register(auto_instrument=True)
, you can enable automatic instrumentation in your application, which will set up instrumentors based on your installed packages.
from phoenix.otel import register
register(auto_instrument=True)
For more details, you can check the docs and explore further options.
Available in Phoenix 11.12+
In the latest release, Arize Phoenix now includes dedicated project dashboards featuring:
Trace latency and error metrics
Latency quantiles
Annotation scores over time
Cost trends by token type
Top models ranked by cost and token usage
LLM invocation and error tracking
Tool calls and error statistics
You can set the project dashboard as the default view for your project in the configuration page.
Learn more .
Available in Phoenix 11.9+
New Features:
Added transferTracesToProject
GraphQL mutation to move traces between projects, preserving annotations and cost calculations for seamless reorganization.
Added createProject
GraphQL mutation to create new projects programmatically via the API.
OpenInference Java is now available, providing a comprehensive solution for tracing AI applications using OpenTelemetry. Fully compatible with any OpenTelemetry-compatible collector or backend like Arize.
Included in this release:
openinference-semantic-conventions: Java constants for capturing model calls, embeddings, and tool usage.
openinference-instrumentation: Core utilities for manual OpenInference instrumentation.
openinference-instrumentation-langchain4j: Auto-instrumentation for LangChain4j applications.
All libraries are published and ready to add to your build to initialize tracing and capture rich AI traces.
Learn more:
Available in Phoenix 11.7+
New Features in Phoenix 11.7+:
Added a new experiments
property to both Client
and AsyncClient
for invoking experiment workflows.
Introduced Experiments
and AsyncExperiments
classes with run_experiment
methods supporting tasks, evaluators, dry-run mode, and metadata.
Implemented SyncExecutor
and AsyncExecutor
classes for concurrent execution with built-in progress bars.
Added RateLimiter
and AdaptiveTokenBucket
for intelligent handling and throttling of rate-limit errors.
Bug Fixes:
Fixed a typo in the datasets.get_dataset_versions
docstring.
Enhancements:
Introduced a PhoenixException
base class and refactored exception imports for consistency.
Simplified rate limiter output by replacing printif
with direct print statements.
Available in Phoenix 11.4+
You can now set a baseline run when comparing multiple experiments. This is especially useful when one run represents a known-good output (e.g. a previous model version or a CI-approved run), and you want to evaluate changes relative to it.
For example, in an evaluation like accuracy
, you can easily see where the value flipped from correct → incorrect
or incorrect → correct
between your baseline and the current comparison - helping you quickly spot regressions or improvements.
This feature makes it easier to isolate the impact of changes like a new prompt, model, or dataset.
Available in Phoenix 11.5+
New Features:
Added a disk usage monitor daemon that periodically checks storage consumption.
Sends warning emails to administrators when usage crosses a configured threshold.
Blocks insert/update operations when usage exceeds a higher critical threshold.
Introduced configurable environment variables for warning and blocking thresholds with validation.
Integrated disk usage checks into both the FastAPI app and gRPC serve to enforce write blocked.
Enhancements:
Extended the email sender with a method and HTML template specifically for disk usage alert notifications.
Available in Phoenix 11.4+
You can now see total and segmented costs directly in your Phoenix trace headers for faster debugging and spend visibility.
Extended TraceDetails
GraphQL query to include costSummary
fields (prompt, completion, total).
Passes costSummary
data into TraceHeader
and displays formatted total cost.
Adds a tooltip in TraceHeader
showing prompt vs. completion cost breakdown.
Available in Phoenix 11.0+
Phoenix now allows you to track token-based costs for LLM runs automatically, calculating costs from token counts and model pricing data and rolling them up to trace and project levels for comprehensive analysis.
New Features:
Automatic calculation of token-based costs using Phoenix’s built-in model pricing table.
Support for custom pricing configurations in Settings > Models when needed.
Token counts and model information are captured automatically when using OpenInference auto-instrumentation with OpenAI, Anthropic, and other supported SDKs.
For manual instrumentation, token count attributes can be included in spans to enable cost tracking.
OpenTelemetry users can leverage OpenInference semantic conventions to include token counts in LLM spans.
We’ve added a comprehensive management and provisioning layer to Phoenix, enabling enhanced team collaboration and access control.
New Features:
Ability to create and manage multiple customized Phoenix spaces tailored to different teams and use cases.
Granular user access management for each individual space.
Support for multiple users collaborating within the same Phoenix projects.
Available in Phoenix 10.15+
Phoenix’s Playground now supports Amazon Bedrock, allowing users to run prompts directly against Bedrock-hosted models within the platform.
New Features:
Run prompts on Amazon Bedrock models seamlessly from Phoenix’s Playground.
Compare outputs side-by-side with other model providers for better evaluation.
Instantly track usage metrics, latency, and cost associated with Bedrock models.
Fine-tune prompt strategies within Phoenix without needing to switch tools.
Available in Phoenix 10.12+
New Features:
Added POST /projects/{project_identifier}/spans
route for span ingestion.
Added log_spans
client method to submit a sequence of spans, rejecting the entire batch if any span is invalid or not unique.
Added log_spans_dataframe
for submitting spans as a dataframe.
Introduced uniquify_spans
and uniquify_spans_dataframe
helpers to regenerate span and trace IDs while preserving relationships.
Improved validation and error handling to prevent partial ingestion and ensure safe, conflict-free span creation.
Available in Phoenix 10.11+
This release enables filtering of datasets by name across both the API and user interface, integrating a live search input along with support for pagination and sorting to improve data navigation and usability.
Added a DatasetFilter
input and enum to the GraphQL schema, allowing users to filter datasets by name using case-insensitive matching.
Created a debounced DatasetsSearch
component on the Datasets page that lets users filter results live as they type.
Available in Phoenix 10.9+
New visualizations Phoenix provide deeper insights into experiment performance over time.
With Experiment Progress Charts, you can now:
Visualize how evaluation scores evolve across experiment runs
Monitor evaluator performance and detect regressions
Analyze latency trends to identify bottlenecks and inefficiencies
These collapsible visual tools eliminate the need for manual inspection and make it significantly easier to track the impact of changes in your LLM or agent workflows.
Available in Phoenix 10.7+
We’ve added support for Ollama in the Playground, enabling you to experiment with and customize model parameters directly within the platform for more flexible and tailored prompt versioning.
Available in Phoenix 8.28+
When stopping the Phoenix server via Ctrl+C
, the shutdown process now exits cleanly without displaying a traceback or returning a non-zero exit code. Previously, a KeyboardInterrupt
and CancelledError
traceback could appear, ending the process with status code 130. The server now swallows the interrupt for a smoother shutdown experience, exiting with code 0 by default to reflect intentional termination.
: Use Float for token count summaries
: Improve browser compatibility for table sizing
: Simplify homeLoaderQuery
to prevent idle timeout errors
Available in Phoenix 8.26+
We’re excited to announce a powerful capability in the OSS library openinference-instrumentation-mcp
— seamless OTEL context propagation for MCP clients and servers.
This release introduces automatic distributed tracing for Anthropic’s Model Context Protocol (MCP). Using OpenTelemetry, you can now:
Propagate context across MCP client-server boundaries
Generate end-to-end traces of your AI system across services and languages
Gain full visibility into how models access and use external context
The openinference-instrumentation-mcp
package handles this for you by:
Creating spans for MCP client operations
Injecting trace context into MCP requests
Extracting and continuing the trace context on the server
Associating the context with OTEL spans on the server side
Instrument both MCP client and server with OpenTelemetry.
Add the openinference-instrumentation-mcp
package.
Spans will propagate across services, appearing as a single connected trace in Phoenix.
Full example usage is available:
Big thanks to Adrian Cole and Anuraag Agrawal for their contributions to this feature.
Available in Phoenix 8.25+
Tool call and result IDs are now shown in the span details view. Each ID is placed within a collapsible header and can be easily copied. This update also supports spans with multiple tool calls. Get started with tracing your tool calls .
Performance: Do not refetch tables when trace and span details closed
UI: Redirect /v1/traces to root path
Playground: Update GPT-4.1 models in Playground
Available in Phoenix 8.22+
We’ve added support for Prompt Tagging in the Phoenix client. This new feature gives you more control and visibility over your prompts throughout the development lifecycle.
Tag prompts directly in your code and see those tags reflected in the Phoenix UI.
Label prompt versions as development
, staging
, or production
— or define your own custom tags.
Add tag descriptions to provide additional context or list out all tags.
Check out documentation on .
: Add aiohttp to container for azure-identity
Available in Phoenix 8.19+
Within each project, there is now a Config tab to enhance customization. The default tab can now be set per project, ensuring the preferred view is displayed.
Learn more in .
: Use correlated subquery for orphan spans
: Add toggle to treat orphan spans as root
: Upgrade react-router, vite, vitest
Experiments: Included delete experiment option to action menu
Feature: Added support for specifying admin users via an environment variable at startup
Annotation: Now displays metadata
Settings Page: Now split across tabs for improved navigation and easier access
Feedback: Added full metadata
Projects: Improved performance
UI: Added date format descriptions to explanations
Phoenix is now available for deployment as a fully hosted service.
In addition to our existing notebook, CLI, and self-hosted deployment options, we’re excited to announce that Phoenix is now available as a .
With hosted instances, your data is stored between sessions, and you can easily share your work with team members.
We are partnering with LlamaIndex to power a new observability platform in LlamaCloud: LlamaTrace. LlamaTrace will automatically capture traces emitted from your LlamaIndex applications, and store them in a persistent, cloud- accessible Phoenix instance.
Hosted Phoenix is 100% free-to-use, .
Available in Phoenix 7.9+
In addition to using our automatic instrumentors and tracing directly using OTEL, we've now added our own layer to let you have the granularity of manual instrumentation without as much boilerplate code.
You can now access a tracer object with streamlined options to trace functions and code blocks. The main two options are:
Using the decorator @tracer.chain
traces the entire function automatically as a Span in Phoenix. The input, output, and status attributes are set based on the function's parameters and return value.
Using the tracer in a with
clause allows you to trace specific code blocks within a function. You manually define the Span name, input, output, and status.
Check out the for more on how to use tracer objects.
Available in Phoenix 4.6+
We are introducing a new built-in function call evaluator that scores the function/tool-calling capabilities of your LLMs. This off-the-shelf evaluator will help you ensure that your models are not just generating text but also effectively interacting with tools and functions as intended.
This evaluator checks for issues arising from function routing, parameter extraction, and function generation.
Check out a .
from phoenix.client import Client
from phoenix.client.helpers.spans import uniquify_spans
client = Client()
spans = [
{
"name": "llm_call",
"context": {"trace_id": "trace_123", "span_id": "span_456"},
"start_time": "2024-01-15T10:00:00Z",
"end_time": "2024-01-15T10:00:05Z",
"span_kind": "LLM"
}
]
unique_spans = uniquify_spans(spans)
result = client.spans.log_spans(
project_identifier="my-project",
spans=unique_spans,
)
We’ve added support for the GoogleGenAIModel
in phoenix-evals
, enabling direct access to Google's Gemini models through the official Google GenAI SDK. As of late 2024, this is the recommended approach for working with Gemini, offering a unified interface across both the Developer API and VertexAI.
🚀 Key Features
Multimodal Support Run evaluations on text, image, and audio inputs using Gemini’s multimodal capabilities.
Async-Ready Optimized for high-throughput evals with full async compatibility.
Flexible Authentication Supports both API key and VertexAI-based authentication methods.
Dynamic Rate Limiting Built-in rate limiter with automatic adjustment based on API feedback and usage patterns.
This integration makes it easier to run robust, scalable evaluations using Gemini models directly within your phoenix-evals
workflows.
Huge shoutout to Siddharth Sahu for this contribution!
More Information in our docs:
Available in Phoenix 11.12+
The experiment comparison table now displays average experiment run data in the table headers, making it easier to spot high-level differences across runs at a glance.
Available in Phoenix 11.3+
You can now click “Add to Cursor” directly in the Phoenix README to get a continuously updating MCP server configuration integrated into your IDE. This makes it seamless to keep your Phoenix + MCP setup in sync while developing with Cursor.
phoenix-support
Tool for AgentsThe phoenix-support
tool from @arizeai/phoenix-mcp@2.2.0
allows Agents like Cursor, Claude, and Windsurf to:
Look up Phoenix and OpenInference documentation and best practices.
Use this information to make code changes automatically in your workspace.
For Example: Watch Cursor 1-shot instrument a LlamaIndex app using Phoenix without manual intervention.
We’ve added a Python auto-instrumentation library for the Google GenAI SDK. This enables seamless tracing of GenAI workflows with full OpenTelemetry compatibility. Traces can be exported to any OpenTelemetry collector.
pip install openinference-instrumentation-google-genai
For more details on how to set up the tracing integration seamlessly:
Additionally, the Google GenAI instrumentor is now supported and works seamlessly with Span Replay in Phoenix, enabling deep trace inspection and replay for more effective debugging and observability.
Big thanks to Harrison Chu for his contributions.
Available in Phoenix 8.30+
The Phoenix client now includes the SpanQuery
DSL, enabling more advanced and flexible span querying for distributed tracing and telemetry data. This allows users to perform complex queries on span data, improving trace analysis and debugging.
In addition, the get_spans_dataframe
method has been migrated, offering an easy-to-use way to extract span-related information as a Pandas DataFrame. This simplifies data processing and visualization, making it easier to analyze trace data within Python-based environments.
Projects: Add "Copy Name" button to project menu
TLS: Add independent flags for whether TLS is enabled for HTTP and gRPC servers
Playground: Log playground subscription errors
API: New RBAC primitives have been introduced for FastAPI and REST APIs
Available in Phoenix 8.24+
This update enhances the Project Management API with more flexible project identification:
Enhanced project identification: Added support for identifying projects by both ID and hex-encoded name and introduced a new _get_project_by_identifier
helper function
Also includes streamlined operations, better validation & error handling, and expanded test coverage.
Performance: Restore streaming
Playground: update Gemini models
Enhancement: Route user to forgot-password page in welcome email url
Available in Phoenix 8.23+
This release introduces a REST API for managing projects, complete with full CRUD functionality and access control. Key features include:
CRUD Operations: Create, read, update, and delete projects via the new API endpoints.
Role-Based Access Control:
Admins can create, read, update, and delete projects
Members can create and read projects, but cannot modify or delete them.
Additional Safeguards: Immutable Project Names, Default Project Protection, Comprehensive Integration Tests
Check out our new documentation to test these features.
Phoenix Server: add PHOENIX_ALLOWED_ORIGINS env
Tracing: Delete annotations in the feedback table, Make feedback table scrollable
Experiments: Allow scrolling the entire experiment compare table
Projects: Make time range selector more accessible
Playground: Don't close model settings dialog when picking Azure version
Session: improve PostgreSQL error message in launch_app
Available in Phoenix 8.21+
The new span aside moves the Span Annotation editor into a dedicated panel, providing a clearer view for adding annotations and enhancing customization of your setup. Read this documentation to learn how annotations can be used.
Enhancement: Allow the option to have no configured working directory when using Postgres
Performance: Cache project table results when toggling the details slide-over for improved performance
UI: Add chat and message components for note-taking
Available in Phoenix 8.20+
Newly added to the OpenAI Agent SDK is support for MCP Span Info, allowing for the tracing and extraction of useful information about MCP tool listings. Use the Phoenix OpenAI Agents SDK for powerful agent tracing.
Available in Phoenix 8.20+
You can now toggle the option to treat orphan spans as root when viewing your spans. Additionally, we've enhanced the UI with an icon view in span details for better visibility in smaller displays. Learn more in our .
Performance: Disable streaming when a dialog is open
Playground: Removed unpredictable playground transformations
Available in Phoenix 8.17+
You can now specify one or more admin users at startup using an environment variable. This is especially useful for managed deployments, allowing you to define admin access in a manifest or configuration file. The specified users will be automatically seeded into the database, enabling immediate login without manual setup.
Performance: Smaller page sizes
Projects: Improved performance on projects page
Experiments: Allow hover anywhere on experiment cell
Annotations: Show metadata
Feedback: Show full metadata
Available in Phoenix 8.19+
You can now delete experiments directly from the action menu, making it quicker to manage and clean up your workspace. This update streamlines experiment management by reducing the steps needed to remove outdated or unnecessary runs. Get started with experiments .
UI: Show the date format in the explanation
Available in Phoenix 8.13+
We've introduced the OpenAI Agents SDK for Python which provides enhanced visibility into agent behavior and performance.
Installation
pip install openinference-instrumentation-openai-agents openai-agents
Includes an OpenTelemetry Instrumentor that traces agents, LLM calls, tool usage, and handoffs.
With minimal setup, use the register
function to connect your app to Phoenix and view real-time traces of agent workflows.
For more details on a quick setup, check out our integration documentation:
Prompt Playground: Azure API key made optional, included specialized UI for thinking budget parameter
Performance: Make the spans table the default tab
Components: Added react-aria Tabs components
Enhancement: Download experiment runs and annotations as CSV
Available in Phoenix 4.11+
Our integration with Guardrails AI allows you to capture traces on guard usage and create datasets based on these traces. This integration is designed to enhance the safety and reliability of your LLM applications, ensuring they adhere to predefined rules and guidelines.
Check out the Cookbook here.
Available in Phoenix 5.0+
We've added Authentication and Rules-based Access Controls to Phoenix. This was a long-requested feature set, and we're excited for the new uses of Phoenix this will unlock!
The auth feature set includes:
Secure Access: All of Phoenix’s UI & APIs (REST, GraphQL, gRPC) now require access tokens or API keys. Keep your data safe!
RBAC (Role-Based Access Control): Admins can manage users; members can update their profiles—simple & secure.
API Keys: Now available for seamless, secure data ingestion & querying.
OAuth2 Support: Easily integrate with Google, AWS Cognito, or Auth0. ✉ Password Resets via SMTP to make security a breeze.
For all the details on authentication, view our docs.
Numerous stability improvements to our hosted Phoenix instances accessed on app.phoenix.arize.com
Added a new command to easily launch a Phoenix client from the cli: phoenix serve
Implemented simple email sender to simplify dependencies
Improved error handling for imported spans
Replaced hdbscan with fast-hdbscan. Added PHOENIX_CSRF_TRUSTED_ORIGINS environment variable to set trusted origins
Added support for Mistral 1.0
Fixed an issue that caused px.Client().get_spans_dataframe() requests to time out
Available in Phoenix 9.0.0+
Phoenix v9.0.0 release brings major updates to annotation support, and a whole host of other improvements.
Up until now, Phoenix has only supported one annotation of a given type on each trace. We've now unlocked that limit, allowing you to capture multiple values of an annotation label on each span.
In addition, we've added:
API support for annotations - create, query, and update annotations through the REST API
Additional support for code evaluations as annotations
Support for arbitrary metadata on annotations
Annotation configurations to structure your annotations within and across projects
Now you can create custom global and per-project data retention polices to remove traces after a certain window of time, or based on number of traces. Additionally, you can now view your disk usage in the Settings page of Phoenix.
We've added hotkeys to Phoenix!
You can now use j
and k
to quickly page through your traces, and e
and n
to add annotations and notes - you never have to lift your hands off the keyboard again!
Available in Phoenix 8.29+
Phoenix now supports Transport Layer Security (TLS) for both HTTP and gRPC connections, enabling encrypted communication and optional mutual TLS (mTLS) authentication. This enhancement provides a more secure foundation for production deployments.
Secure HTTP & gRPC Connections: Phoenix can now serve over HTTPS and secure gRPC.
Flexible TLS Configuration: TLS settings are managed via environment variables.
Optional Client Verification: Support for mTLS with configurable client certificate validation.
Improved Testing: TLS-aware infrastructure added to integration tests.
Better Visibility: Server startup logs now display TLS status.
Set the following environment variables to enable and customize TLS:
Available in Phoenix 8.9+
New update overview:
Prompt Playground: Now supports & Anthropic Sonnet 3.7 and Thinking Budgets
Instrumentation: to trace smolagents by Hugging Face
Evals: o3 support, Audio & Multi-Modal Evaluations
Integrations: Phoenix now supports &
: Show percent used of DB
: Add environment variable for allocated DB storage capacity
: Delete selected traces
: Make trace tree more readable on smaller sizes
: Ensure type is correct on run_experiment
: Allow experiment run JSON downloads
: Add anthropic thinking config param
: Add ToggleButton
Available in Phoenix 8.11+
Save and Load from Prompts: You can now save and load configurations directly from prompts.
Save and Load from Default Model Config: Default model configurations can be saved and loaded.
Budget Token Management: Added the ability to adjust the budget token value.
Thinking Configuration Toggle: You can now enable or disable the “thinking” feature.
Important Note: The default model config does not automatically apply to saved prompts. To include default thinking settings, ensure they are saved within the specific prompt.
: Added annotations to experiment JSON downloads
: Add none
as option for tool choice for anthropic 0.49.0
: Port slider component to react-aria
Available in Phoenix 8.5+
We’ve introduced several enhancements to Projects, providing greater flexibility and control over how you interact with data. These updates include:
: Your selected columns will now remain consistent across sessions, ensuring a more seamless workflow.
: Easily filter data directly from the table view using metadata attributes.
Custom Time Ranges: You can now specify custom time ranges to filter traces and spans.
Root Span Filter for Spans: Improved filtering options allow you to filter by root spans, helping to isolate and debug issues more effectively.
: Quickly apply common metadata filters for faster navigation.
: Major speed improvements in project tracing views & visibility into database usage in settings
: Query to get number of spans for each trace
: Show + n
more spans in trace table
: Add Token component
: Remove double fetching of spans
: Don't fetch new traces when the traces slideover is visible
: Fix scrolling on trace tree
Available in Phoenix 6.0+
Sessions allow you to group multiple responses into a single thread. Each response is still captured as a single trace, but each trace is linked together and presented in a combined view.
Sessions make it easier to visual multi-turn exchanges with your chatbot or agent Sessions launches with Python and TS/JS support. For more on sessions, check out and the .
Added support for FastAPI and GraphQL extensions
Fixed a bug where Anthropic LLM as a Judge responses would be labeled as unparseable
Fixed a bug causing 500 errors on client.get_traces_dataset() and client.get_spans_dataframe()
Added the ability for authentication to work from behind a proxy
Added an environment variable to set default admin passwords in auth
Available in Phoenix 4.6+
Datasets: Datasets are a new core feature in Phoenix that live alongside your projects. They can be imported, exported, created, curated, manipulated, and viewed within the platform, and should make a few flows much easier:
Fine-tuning: You can now create a dataset based on conditions in the UI, or by manually choosing examples, then export these into CSV or JSONL formats ready-made for fine-tuning APIs.
Experimentation: External datasets can be uploaded into Phoenix to serve as the test cases for experiments run in the platform.
For more details on using datasets see our or .
Experiments: Our new Datasets and Experiments feature enables you to create and manage datasets for rigorous testing and evaluation of your models. You can now run comprehensive experiments to measure and analyze the performance of your LLMs in various scenarios.
For more details, check out our full .
Available in Phoenix 7.0+
Sessions allow you to group multiple responses into a single thread. Each response is still captured as a single trace, but each trace is linked together and presented in a combined view.
Sessions make it easier to visual multi-turn exchanges with your chatbot or agent Sessions launches with Python and TS/JS support. For more on sessions, check out and the .
Prompt Playground: Added support for arbitrary string model names Added support for Gemini 2.0 Flash Improved template editor ergonomics
Evals: Added multimodal message template support
Tracing: Added JSON pretty printing for structured data outputs (thank you sraibagiwith100x!) Added a breakdown of token types in project summary
Bug Fixes: Changed trace latency to be computed every time, rather than relying on root span latency, Added additional type checking to handle non-string values when manually instrumenting (thank you Manuel del Verme!)
PHOENIX_TLS_ENABLED
boolean
Enable or disable TLS (true
/false
)
PHOENIX_TLS_CERT_FILE
string
Path to TLS certificate file
PHOENIX_TLS_KEY_FILE
string
Path to private key file
PHOENIX_TLS_KEY_FILE_PASSWORD
string
Password for encrypted private key file
PHOENIX_TLS_CA_FILE
string
Path to CA certificate (for client verification)
PHOENIX_TLS_VERIFY_CLIENT
boolean
Enable client cert verification
Available in Phoenix 8.27+
Improved trace navigation by automatically scrolling the selected span into view when a user navigates to a specific trace. This enhancement eliminates the need for manual searching or scrolling, allowing users to immediately focus on the span of interest. It's especially useful when navigating from links or alerts that point to a specific span, improving debugging efficiency. This change contributes to a smoother and more intuitive trace exploration experience.
Enhancement: Add /readyz endpoint to confirm database connectivity
Enhancement: Allow scroll on settings page
Available in Phoenix 8.0+
Phoenix prompt management will now let you create, modify, tag, and version control prompts for your applications. Some key highlights from this release:
Versioning & Iteration: Seamlessly manage prompt versions in both Phoenix and your codebase.
New TypeScript Client: Sync prompts with your JavaScript runtime, now with native support for OpenAI, Anthropic, and the Vercel AI SDK.
New Python Client: Sync templates and apply them to AI SDKs like OpenAI, Anthropic, and more.
Standardized Prompt Handling: Native normalization for OpenAI, Anthropic, Azure OpenAI, and Google AI Studio.
Enhanced Metadata Propagation: Track prompt metadata on Playground spans and experiment metadata in dataset runs.
Check out the docs and this for more on prompts.📝
Available in Phoenix 8.14+
Available in Phoenix 10.6+
We’re excited to announce that Phoenix can now be deployed via a Helm chart for Kubernetes.
This allows you to:
Quickly spin up Phoenix with a single helm install
and a single YAML file.
Launch using the infrastructure and deployment patterns recommended by the Phoenix team, ensuring consistency and ease of maintenance.
Easily upgrade to the latest Phoenix features and improvements over time.
Whether you are self-hosting in a cloud Kubernetes cluster or on-premises, the new Helm chart makes deploying Phoenix simpler and more reliable than ever.
Available in Phoenix 10.5+
Phoenix v10.5.0 now supports Deepseek and xAI models in Playground natively. Previous versions of Phoenix supported these as custom model endpoints, but that process has now been streamlined to offer these model providers from the main Playground dropdown.
The latest from the Phoenix team.
New in phoenix-evals
: Added support for Google's Gemini models via the Google GenAI SDK — multimodal, async, and ready to scale. Huge shoutout to Siddharth Sahu for this contribution!
Available in Phoenix 11.12+
Phoenix now has comprehensive project dashboards for detailed performance, cost, and error insights.
Available in Phoenix 11.12+
View average run metrics directly in the headers of the experiment comparison table for quick insights.
Available in Phoenix 11.9+
Create new projects and transfer traces between them via GraphQL, with full preservation of annotations and cost data.
OpenInference Java now offers full OpenTelemetry-compatible tracing for AI apps, including auto-instrumentation for LangChain4j and semantic conventions.
Available in Phoenix 11.7+
New experiments feature set in phoenix-client
, enabling sync and async execution with task runs, evaluations, rate limiting, and progress reporting.
Available in Phoenix 11.6+
Compare experiments relative to a baseline run to easily spot regressions and improvements across metrics.
Available in Phoenix 11.5+
Monitor database disk usage, notify admins when nearing capacity, and automatically block writes when critical thresholds are reached.
Available in Phoenix 11.4+
Added cost summaries to trace headers, showing total and segmented (prompt & completion) costs at a glance while debugging.
Available in Phoenix 11.3+
Phoenix README now has a “Add to Cursor” button for seamless IDE integration with Cursor. @arizeai/phoenix-mcp@2.2.0
also includes a new tool called phoenix-support
, letting agents like Cursor auto-instrument your apps using Phoenix and OpenInference best practices.
Available in Phoenix 11.0+
Phoenix now automatically tracks token-based LLM costs using model pricing and token counts, rolling them up to trace and project levels for clear, actionable cost insights.
Phoenix now supports multiple customizable spaces with individual user access and collaboration, enabling teams to work together seamlessly.
Available in Phoenix 10.15+
Phoenix’s Playground now supports Amazon Bedrock, letting you run, compare, and track Bedrock models alongside others—all in one place.
Available in Phoenix 10.12+
Now you can filter sessions by their unique session_id
across the API and UI, making it easier to pinpoint and inspect specific sessions.
Available in Phoenix 10.12+
Now you can create spans directly via a new POST API and client methods, with helpers to safely regenerate IDs and prevent conflicts on insertion.
Available in Phoenix 10.11+
Dataset name filtering with live search support across the API and UI.
Available in Phoenix 10.9+
Phoenix now has experiment graphs to track how your evaluation scores and latency evolve over time.
Ollama is now supported in the Playground, letting you experiment with its models and customize parameters for tailored prompting.
Available in Phoenix 10.6+
Added Helm chart support for Phoenix, making Kubernetes deployment fast, consistent, and easy to upgrade.
Available in Phoenix 10.7+
Deepseek and xAI models are now available in Prompt Playground!
We've added a host of new methods to the JS client:
getExperiment - allows you to retrieve an Experiment to view its results, and run evaluations on it
evaluateExperiment - allows you to evaluate previously run Experiments using LLM as a Judge or Code-based evaluators
createDataset - allows you to create Datasets in Phoenix using the client
appendDatasetExamples - allows you to append additional examples to a Dataset
You can now run Experiments using the Phoenix JS client! Use Experiments to test different iterations of your applications over a set of test cases, then evaluate the results. This release includes:
Native tracing of tasks and evaluators
Async concurrency queues
Support for any evaluator (including bring your own evals)
Available in Phoenix 9.0+
Major Release: Phoenix v9.0.0
Phoenix's v9.0.0 release brings with it:
A host of improvements to Annotations, including one-to-many support, API access, annotation configs, and custom metadata
Customizable data retention policies
Hotkeys! 🔥
We’ve added a Python auto-instrumentation library for the Google GenAI SDK. This enables seamless tracing of GenAI workflows with full OpenTelemetry compatibility. Additionally, the Google GenAI instrumentor is now supported and works seamlessly with Span Replay in Phoenix.
Available in Phoenix 8.30+
The Phoenix client now includes the SpanQuery
DSL for more advanced span querying. Additionally, a get_spans_dataframe
method has been added to facilitate easier data extraction for span-related information.
Available in Phoenix 8.29+
Phoenix now supports Transport Layer Security (TLS) for both HTTP and gRPC connections, enabling encrypted communication and optional mutual TLS (mTLS) authentication. This enhancement provides a more secure foundation for production deployments.
Available in Phoenix 8.28+
When stopping the Phoenix server via Ctrl+C
, the shutdown process now exits cleanly with code 0 to reflect intentional termination. Previously, this would trigger a traceback with KeyboardInterrupt
, misleadingly indicating a failure.
Available in Phoenix 8.27+
Improved trace navigation by automatically scrolling the selected span into view when a user navigates to a specific trace. This enhances usability by making it easier to locate and focus on the relevant span without manual scrolling.
Available in Phoenix 8.26+
We’ve released openinference-instrumentation-mcp
, a new package in the OpenInference OSS library that enables seamless OpenTelemetry context propagation across MCP clients and servers. It automatically creates spans, injects and extracts context, and connects the full trace across services to give you complete visibility into your MCP-based AI systems.
Big thanks to Adrian Cole and Anuraag Agrawal for their contributions to this feature.
Available in Phoenix 8.26+
Phoenix now supports programmatic API key creation through a new endpoint, making it easier to automate project setup and trace logging. To enable this, set the PHOENIX_ADMIN_SECRET
environment variable in your deployment.
Available in Phoenix 8.25+
Tool call and result IDs are now shown in the span details view. Each ID is placed within a collapsible header and can be easily copied. This update also supports spans with multiple tool calls. Get started with tracing your tool calls here.
Available in Phoenix 8.24+
This update enhances the Project Management API with more flexible project identification We've added support for identifying projects by both ID and hex-encoded name and introduced a new _get_project_by_identifier
helper function.
Available in Phoenix 8.23+
This release introduces a REST API for managing projects, complete with full CRUD functionality and access control. Key features include CRUD Operations and Role-Based Access Control. Check out our new documentation to test these features.
Available in Phoenix 8.22+
We’ve added support for Prompt Tagging in the Phoenix client. This new feature gives you more control and visibility over your prompts throughout the development lifecycle. Tag prompts directly in code, label prompt versions, and add tag descriptions. Check out documentation on prompt tags.
Available in Phoenix 8.21+
The new span aside moves the Span Annotation editor into a dedicated panel, providing a clearer view for adding annotations and enhancing customization of your setup. Read this documentation to learn how annotations can be used.
Available in Phoenix 8.20+
Newly added to the OpenAI Agent SDK is support for MCP Span Info, allowing for the tracing and extraction of useful information about MCP tool listings. Use the Phoenix OpenAI Agents SDK for powerful agent tracing.
Available in Phoenix 8.20+
You can now toggle the option to treat orphan spans as root when viewing your spans. Additionally, we've enhanced the UI with an icon view in span details for better visibility in smaller displays. Learn more .
Available in Phoenix 8.19+
Within each project, there is now a Config tab to enhance customization. The default tab can now be set per project, ensuring the preferred view is displayed. Learn more in .
Available in Phoenix 8.17+
You can now preconfigure admin users at startup using an environment variable, making it easier to manage access during deployment. Admins defined this way are automatically seeded into the database and ready to log in.
Available in Phoenix 8.16+
You can now delete experiments directly from the action menu, making it quicker to manage and clean up your workspace.
Available in Phoenix 8.15+
In the New Project tab, we've added quick setup to instrument your application for BeeAI, SmolAgents, and the OpenAI Agents SDK. Easily configure these integrations with streamlined instructions. Check out all Phoenix tracing integrations here.
Available in Phoenix 8.14+
We've added the ability to resize Span, Trace, and Session tables. Resizing preferences are now persisted in the tracing store, ensuring settings are maintained per-project and per-table.
Available in Phoenix 8.13+
We've introduced the OpenAI Agents SDK for Python which provides enhanced visibility into agent behavior and performance. For more details on a quick setup, check out our docs.
pip install openinference-instrumentation-openai-agents openai-agents
Available in Phoenix 8.11+
You can now save and load configurations directly from prompts or default model settings. Additionally, you can adjust the budget token value and enable/disable the "thinking" feature, giving you more control over model behavior and resource allocation.
Available in Phoenix 8.9+
Prompt Playground now supports new GPT and Anthropic models with enhanced configuration options. Instrumentation options have been improved for better traceability, and evaluation capabilities have expanded to cover Audio & Multi-Modal Evaluations. Phoenix also introduces new integration support for LiteLLM Proxy & Cleanlabs evals.
Available in Phoenix 8.8+
We’ve rolled out several enhancements to Projects, offering more flexibility and control over your data. Key updates include persistent column selection, advanced filtering options for metadata and spans, custom time ranges, and improved performance for tracing views. These changes streamline workflows, making data navigation and debugging more efficient.
Check out docs for more.
Available in Phoenix 8.0+
Phoenix prompt management will now let you create, modify, tag, and version control prompts for your applications. Some key highlights from this release:
Versioning & Iteration: Seamlessly manage prompt versions in both Phoenix and your codebase.
New TypeScript Client: Sync prompts with your JavaScript runtime, now with native support for OpenAI, Anthropic, and the Vercel AI SDK.
New Python Client: Sync templates and apply them to AI SDKs like OpenAI, Anthropic, and more.
Standardized Prompt Handling: Native normalization for OpenAI, Anthropic, Azure OpenAI, and Google AI Studio.
Enhanced Metadata Propagation: Track prompt metadata on Playground spans and experiment metadata in dataset runs.
Check out the docs and this walkthrough for more on prompts!📝
Available in Phoenix 8.0+
Phoenix has made it even simpler to get started with tracing by introducing one-line auto-instrumentation. By using register(auto_instrument=True)
, you can enable automatic instrumentation in your application, which will set up instrumentors based on your installed packages.
from phoenix.otel import register
register(auto_instrument=True)
Available in Phoenix 7.9+
In addition to using our automatic instrumentors and tracing directly using OTEL, we've now added our own layer to let you have the granularity of manual instrumentation without as much boilerplate code.
You can now access a tracer object with streamlined options to trace functions and code blocks. The main two options are using the decorator @tracer.chain
and using the tracer in a with
clause.
Check out the docs for more on how to use tracer objects.
Available in Phoenix 7.0+
Sessions allow you to group multiple responses into a single thread. Each response is still captured as a single trace, but each trace is linked together and presented in a combined view.
Sessions make it easier to visualize multi-turn exchanges with your chatbot or agent. Sessions launches with Python and TS/JS support. For more on sessions, check out a walkthrough video and the docs.
Available in Phoenix 6.0+
Prompt Playground is now available in the Phoenix platform! This new release allows you to test the effects of different prompts, tools, and structured output formats to see which performs best.
Replay individual spans with modified prompts, or run full Datasets through your variations.
Easily test different models, prompts, tools, and output formats side-by-side, directly in the platform.
Automatically capture traces as Experiment runs for later debugging. See here for more information on Prompt Playground, or jump into the platform to try it out for yourself.
Available in Phoenix 5.0+
We've added Authentication and Rules-based Access Controls to Phoenix. This was a long-requested feature set, and we're excited for the new uses of Phoenix this will unlock!
The auth feature set includes secure access, RBAC, API keys, and OAuth2 Support. For all the details on authentication, view our docs.
Available in Phoenix 4.11.0+
Our integration with Guardrails AI allows you to capture traces on guard usage and create datasets based on these traces. This integration is designed to enhance the safety and reliability of your LLM applications, ensuring they adhere to predefined rules and guidelines.
Check out the Cookbook here.
Phoenix is now available for deployment as a fully hosted service.
In addition to our existing notebook, CLI, and self-hosted deployment options, we’re excited to announce that Phoenix is now available as a fully hosted service. With hosted instances, your data is stored between sessions, and you can easily share your work with team members.
We are partnering with LlamaIndex to power a new observability platform in LlamaCloud: LlamaTrace. LlamaTrace will automatically capture traces emitted from your LlamaIndex application.
Hosted Phoenix is 100% free-to-use, check it out today!
Available in Phoenix 4.6+
Datasets: Datasets are a new core feature in Phoenix that live alongside your projects. They can be imported, exported, created, curated, manipulated, and viewed within the platform, and make fine-tuning and experimentation easier.
For more details on using datasets see our documentation or example notebook.
Experiments: Our new Datasets and Experiments feature enables you to create and manage datasets for rigorous testing and evaluation of your models. Check out our full walkthrough.
Available in Phoenix 4.6+
We are introducing a new built-in function call evaluator that scores the function/tool-calling capabilities of your LLMs. This off-the-shelf evaluator will help you ensure that your models are not just generating text but also effectively interacting with tools and functions as intended. Check out a full walkthrough of the evaluator.
We've added a host of new methods to the JS client:
- allows you to retrieve an Experiment to view its results, and run evaluations on it
- allows you to evaluate previously run Experiments using LLM as a Judge or Code-based evaluators
- allows you to create Datasets in Phoenix using the client
- allows you to append additional examples to a Dataset
You can now run Experiments using the Phoenix JS client! Use Experiments to test different iterations of your applications over a set of test cases, then evaluate the results.
This release includes:
Native tracing of tasks and evaluators
Async concurrency queues
Support for any evaluator (including bring your own evals)
import { createClient } from "@arizeai/phoenix-client";
import {
asEvaluator,
runExperiment,
} from "@arizeai/phoenix-client/experiments";
import type { Example } from "@arizeai/phoenix-client/types/datasets";
import { Factuality } from "autoevals";
import OpenAI from "openai";
const phoenix = createClient();
const openai = new OpenAI();
/** Your AI Task */
const task = async (example: Example) => {
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: JSON.stringify(example.input, null, 2) },
],
});
return response.choices[0]?.message?.content ?? "No response";
};
await runExperiment({
dataset: "dataset_id",
experimentName: "experiment_name",
client: phoenix,
task,
evaluators: [
asEvaluator({
name: "Factuality",
kind: "LLM",
evaluate: async (params) => {
const result = await Factuality({
output: JSON.stringify(params.output, null, 2),
input: JSON.stringify(params.input, null, 2),
expected: JSON.stringify(params.expected, null, 2),
});
return {
score: result.score,
label: result.name,
explanation: (result.metadata?.rationale as string) ?? "",
metadata: result.metadata ?? {},
};
},
}),
],
});