1 of 18

Frequently Asked Questions

What is the difference between Phoenix and Arize?

Arize is the company that makes Phoenix. Phoenix is an open source LLM observability tool offered by Arize. It can be access in its Cloud form online, or self-hosted and run on your own machine or server.

"Arize" can also refer to Arize's enterprise platform, often called Arize AX, available on arize.com. Arize AX is the enterprise SaaS version of Phoenix that comes with additional features like Copilot, ML and CV support, HIPAA compliance, Security Reviews, a customer success team, and more. See here for a breakdown of the two tools.

What is my Phoenix Endpoint?

There are two endpoints that matter in Phoenix:

Application Endpoint: The endpoint your Phoenix instance is running on
OTEL Tracing Endpoint: The endpoint through which your Phoenix instance receives OpenTelemetry traces

Application Endpoint

If you're accessing a Phoenix Cloud instance through our website, then your endpoint is available under the Hostname field of your Settings page.

If you're self-hosting Phoenix, then you choose the endpoint when you set up the app. The default value is http://localhost:6006

To set this endpoint, use the PHOENIX_COLLECTOR_ENDPOINT environment variable. This is used by the Phoenix client package to query traces, log annotations, and retrieve prompts.

OTEL Tracing Endpoint

If you're accessing a Phoenix Cloud instance through our website, then your endpoint is available under the Hostname field of your Settings page.

If you're self-hosting Phoenix, then you choose the endpoint when you set up the app. The default values are:

Using the GRPC protocol: http://localhost:6006/v1/traces
Using the HTTP protocol: http://localhost:4317

As of May 2025, Phoenix Cloud only supports trace collection via HTTP

To set this endpoint, use the register(endpoint=YOUR ENDPOINT) function. This endpoint can also be set using environment variables. For more on the register function and other configuration options, see here.

What is LlamaTrace vs Phoenix Cloud?

LlamaTrace and Phoenix Cloud are the same tool. They are the hosted version of Phoenix provided on app.phoenix.arize.com.

Will Phoenix Cloud be on the latest version of Phoenix?

We update the Phoenix version used by Phoenix Cloud on a weekly basis.

Phoenix Cloud Migration Guide: Legacy to New Version

Learn about options to migrate your legacy Phoenix Cloud instance to the latest version

To move to the new Phoenix Cloud, simply with a different email address. From there, you can start using a new Phoenix instance immediately. Your existing projects in your old (legacy) account will remain intact and independent, ensuring a clean transition.

Since most users don’t use Phoenix Cloud for data storage, this straightforward approach works seamlessly for migrating to the latest version.

If you need to migrate data from the legacy version to the latest version, .

How to know which version of Phoenix Cloud you are on?

The easiest way to determine which version of Phoenix Cloud you’re using is by checking the URL in your browser:

The new Phoenix Cloud version will have a hostname structure like: app.arize.phoenix.com/s/[your-space-name]
If your Phoenix Cloud URL does not include /s/ followed by your space name, you are on the legacy version.

Can I add other users to my Phoenix Instance?

Self-Hosted Phoenix

Self-hosted Phoenix supports multiple user with authentication, roles, and more.

Phoenix Cloud

Phoenix Cloud is no longer limited to single-developer use—teams can manage access and share traces easily across their organization.

`The new Phoenix Cloud supports team management and collaboration. You can spin up multiple, customized Phoenix Spaces for different teams and use cases, manage individual user access and permissions for each space, and seamlessly collaborate with additional team members on your projects.

Can I use Azure OpenAI?

Yes, in fact this is probably the preferred way to interact with OpenAI if your enterprise requires data privacy. Getting the parameters right for Azure can be a bit tricky so check out the models section for details.

Can I use Phoenix locally from a remote Jupyter instance?

Yes, you can use either of the two methods below.

1. Via ngrok (Preferred)

Install pyngrok on the remote machine using the command pip install pyngrok.
Create a free account on ngrok and verify your email. Find 'Your Authtoken' on the dashboard.
In jupyter notebook, after launching phoenix set its port number as the port parameter in the code below. Preferably use a default port for phoenix so that you won't have to set up ngrok tunnel every time for a new port, simply restarting phoenix will work on the same ngrok URL.

import getpass
from pyngrok import ngrok, conf
print("Enter your authtoken, which can be copied from https://dashboard.ngrok.com/auth")
conf.get_default().auth_token = getpass.getpass()
port = 37689
# Open a ngrok tunnel to the HTTP server
public_url = ngrok.connect(port).public_url
print(" * ngrok tunnel \"{}\" -> \"http://127.0.0.1:{}\"".format(public_url, port))

"Visit Site" using the newly printed public_url and ignore warnings, if any.

NOTE:

Ngrok free account does not allow more than 3 tunnels over a single ngrok agent session. Tackle this error by checking active URL tunnels using ngrok.get_tunnels() and close the required URL tunnel using ngrok.disconnect(public_url).

2. Via SSH

This assumes you have already set up ssh on both the local machine and the remote server.

If you are accessing a remote jupyter notebook from a local machine, you can also access the phoenix app by forwarding a local port to the remote server via ssh. In this particular case of using phoenix on a remote server, it is recommended that you use a default port for launching phoenix, say DEFAULT_PHOENIX_PORT.

Launch the phoenix app from jupyter notebook.
In a new terminal or command prompt, forward a local port of your choice from 49152 to 65535 (say 52362) using the command below. Remote user of the remote host must have sufficient port-forwarding/admin privileges.
```
ssh -L 52362:localhost:<DEFAULT_PHOENIX_PORT> <REMOTE_USER>@<REMOTE_HOST>
```
If successful, visit localhost:52362 to access phoenix locally.

If you are abruptly unable to access phoenix, check whether the ssh connection is still alive by inspecting the terminal. You can also try increasing the ssh timeout settings.

Closing ssh tunnel:

Simply run exit in the terminal/command prompt where you ran the port forwarding command.

How can I configure the backend to send the data to the phoenix UI in another container?

If you are working on an API whose endpoints perform RAG, but would like the phoenix server not to be launched as another thread.

You can do this by configuring the following the environment variable PHOENIX_COLLECTOR_ENDPOINT to point to the server running in a different process or container.

Can I run Phoenix on Sagemaker?

With SageMaker notebooks, phoenix leverages the jupyter-server-proy to host the server under proxy/6006.Note, that phoenix will automatically try to detect that you are running in SageMaker but you can declare the notebook runtime via a parameter to launch_app or an environment variable

import os

os.environ["PHOENIX_NOTEBOOK_ENV"] = "sagemaker"

Can I persist data in a notebook?

You can persist data in the notebook by either setting the use_temp_dir flag to false in px.launch_app which will persist your data in SQLite on your disk at the PHOENIX_WORKING_DIR. Alternatively you can deploy a phoenix instance and point to it via PHOENIX_COLLECTOR_ENDPOINT.

What is the difference between GRPC and HTTP?

gRPC and HTTP are communication protocols used to transfer data between client and server applications.

HTTP (Hypertext Transfer Protocol) is a stateless protocol primarily used for website and web application requests over the internet.
gRPC (gRemote Procedure Call) is a modern, open-source communication protocol from Google that uses HTTP/2 for transport, protocol buffers as the interface description language, and provides features like bi-directional streaming, multiplexing, and flow control.

gRPC is more efficient in a tracing context than HTTP, but HTTP is more widely supported.

Phoenix can send traces over either HTTP or gRPC.

Can I use gRPC for trace collection?

Phoenix does natively support gRPC for trace collection post 4.0 release. See Configuration for details.

How do I resolve Phoenix Evals showing NOT_PARSABLE?

NOT_PARSABLE errors often occur when LLM responses exceed the max_tokens limit or produce incomplete JSON. Here's how to fix it:

Increase max_tokens: Update the model configuration as follows:
Update Phoenix: Use version ≥0.17.4, which removes token limits for OpenAI and increases defaults for other APIs.
Check Logs: Look for finish_reason="length" to confirm token limits caused the issue.
If the above doesn't work, it's possible the llm-as-a-judge output might not fit into the defined rails for that particular custom Phoenix eval. Double check the prompt output matches the rail expectations.

Langfuse alternative? Arize Phoenix vs Langfuse: key differences

What is the difference between Arize Phoenix and Langfuse?

Langfuse has an initially similar feature set to Arize Phoenix. Both tools support tracing, evaluation, experimentation, and prompt management, both in development and production. But on closer inspection there are a few notable differences:

While it is open-source, Langfuse locks certain key features like Prompt Playground and LLM-as-a-Judge evals behind a paywall. These same features are free in Phoenix.
Phoenix is significantly easier to self-host than Langfuse. Langfuse requires you to separately setup and link Clickhouse, Redis, and S3. Phoenix can be hosted out-of-the-box as a single docker container.
Langfuse relies on outside instrumentation libraries to generate traces. Arize maintains its own layer that operates in concert with OpenTelemetry for instrumentation.
Phoenix is backed by Arize AI. Phoenix users always have the option to graduate into Arize AX, with additional features, a customer success org, infosec team, and dedicated support. Meanwhile, Phoenix is able to focus entirely on providing the best fully open-source solution in the ecosystem.

Feature Access

Langfuse is open-source, but several critical features are gated behind its paid offering when self-hosting. For example:

Prompt Playground
LLM-as-a-Judge evaluations
Prompt experiments
Annotation queues

These features can be crucial for building and refining LLM systems, especially in early prototyping stages. In contrast, Arize Phoenix offers these capabilities fully open-source.

Ease of Self-Hosting

Self-hosting Langfuse requires setting up and maintaining:

A ClickHouse database for analytics
Redis for caching and background jobs
S3-compatible storage for logs and artifacts

Arize Phoenix, on the other hand, can be launched with a single Docker container. No need to stitch together external services—Phoenix is designed to be drop-in simple for both experimentation and production monitoring. This “batteries-included” philosophy makes it faster to adopt and easier to maintain.

Instrumentation Approach

Langfuse does not provide its own instrumentation layer—instead, it relies on developers to integrate third-party libraries to generate and send trace data.

Phoenix takes a different approach: it includes and maintains its own OpenTelemetry-compatible instrumentation layer, OpenInference.

In fact, Langfuse supports OpenInference tracing as one of its options. This means that using Langfuse requires at least one additional dependency on an instrumentation provider.

Backed by Arize AI

Phoenix is backed by Arize AI, the leading and best-funded AI Observability provider in the ecosystem.

Arize Phoenix is intended to be a complete LLM observability solution, however for users who do not want to self-host, or who need additional features like Custom Dashboards, Copilot, Dedicated Support, or HIPAA compliance, there is a seamless upgrade path to Arize AX.

The success of Arize means that Phoenix does not need to be heavily commercialized. It can focus entirely on providing the best open-source solution for LLM Observability & Evaluation.

Feature Comparison

Feature

Arize Phoenix

Arize AX

Langfuse

Open Source

✅

Tracing

✅

Auto-Instrumentation

✅

Offline Evals

✅

Online Evals

✅

Experimentation

✅

Prompt Management

✅

Prompt Playground

✅

Run Prompts on Datasets

✅

Built-in Evaluators

✅

Agent Evaluations

✅

Human Annotations

✅

Custom Dashboards

✅

Workspaces

✅

Semantic Querying

✅

Copilot Assistant

✅

Final Thoughts

If you're choosing between Langfuse and Arize Phoenix, the right tool will depend on your needs. Langfuse has a polished UI and solid community momentum, but imposes friction around hosting and feature access. Arize Phoenix offers a more open, developer-friendly experience—especially for those who want a single-container solution with built-in instrumentation and evaluation tools.

Open Source LangSmith Alternative: Arize Phoenix vs. LangSmith

A feature comparison guide for AI engineers looking for developer-friendly LangSmith alternatives.

What is the difference between Arize Phoenix and LangSmith?

LangSmith is another LLM Observability and Evaluation platform that serves as an alternative to Arize Phoenix. Both platforms support the baseline tracing, evaluation, prompt management, and experimentation features, but there are a few key differences to be aware of:

LangSmith is closed source, while Phoenix is open source
LangSmith is part of the broader LangChain ecosystem, though it does support applications that don’t use LangChain. Phoenix is fully framework-agnostic.
Self-hosting is a paid feature within LangSmith, vs free for Phoenix.
Phoenix is backed by Arize AI. Phoenix users always have the option to graduate into Arize AX, with additional features, a customer success org, infosec team, and dedicated support. Meanwhile, Phoenix is able to focus entirely on providing the best fully open-source solution in the ecosystem.

Open vs. Closed Source

The first and most fundamental difference: LangSmith is closed source, while Phoenix is fully open source.

This means Phoenix users have complete control over how the platform is used, modified, and integrated. Whether you're running in a corporate environment with custom compliance requirements or you're building novel agent workflows, open-source tooling allows for a degree of flexibility and transparency that closed platforms simply can’t match.

LangSmith users, on the other hand, are dependent on a vendor roadmap and pricing model, with limited ability to inspect or modify the underlying system.

Ecosystem Lock-In vs. Ecosystem-Agnostic

LangSmith is tightly integrated with the LangChain ecosystem, and while it technically supports non-LangChain applications, the experience is optimized for LangChain-native workflows.

Phoenix is designed from the ground up to be framework-agnostic. It supports popular orchestration tools like LangChain, LlamaIndex, CrewAI, SmolAgents, and custom agents, thanks to its OpenInference instrumentation layer. This makes Phoenix a better choice for teams exploring multiple agent/orchestration frameworks—or who simply want to avoid vendor lock-in.

Self-Hosting: Free vs. Paid

If self-hosting is a requirement—for reasons ranging from data privacy to performance—Phoenix offers it out-of-the-box, for free. You can launch the entire platform with a single Docker container, no license keys or paywalls required.

LangSmith, by contrast, requires a paid plan to access self-hosting options. This can be a barrier for teams evaluating tools or early in their journey, especially those that want to maintain control over their data from day one.

Backed by Arize AI

Phoenix is backed by Arize AI, the leading and best-funded AI Observability provider in the ecosystem.

The success of Arize means that Phoenix does not need to be heavily commercialized. It can focus entirely on providing the best open-source solution for LLM Observability & Evaluation.

Feature Comparison

Feature

Arize Phoenix

Arize AX

LangSmith

Open Source

✅

Tracing

✅

Auto-Instrumentation

✅

Offline Evals

✅

Online Evals

✅

Experimentation

✅

Prompt Management

✅

Prompt Playground

✅

Run Prompts on Datasets

✅

Built-in Evaluators

✅

Agent Evaluations

✅

Human Annotations

✅

Custom Dashboards

✅

Workspaces

✅

Semantic Querying

✅

Copilot Assistant

✅

Final Thoughts

LangSmith is a strong option for teams all-in on the LangChain ecosystem and comfortable with a closed-source platform. But for those who value openness, framework flexibility, and low-friction adoption, Arize Phoenix stands out as the more accessible and extensible observability solution.

Braintrust Open Source Alternative? LLM Evaluation Platform Comparison

Dive into the difference between Braintrust and Phoenix open source LLM evaluation and tracing

Braintrust is an evaluation platform that serves as an alternative to Arize Phoenix. Both platforms support core AI application needs, such as: evaluating AI applications, prompt management, tracing executions, and experimentation. However, there are a few major differences.

Why is Arize Phoenix a popular open source alternative to Braintrust?

Braintrust is a proprietary LLM-observability platform that often hits road-blocks when AI engineers need open code, friction-free self-hosting, or things like agent tracing or online evaluation. Arize Phoenix is a fully open-source alternative that fills those gaps while remaining free to run anywhere.

Top Differences (TL;DR)

Open source

OSS

Closed source

Single Docker

Enterprise-only hybrid

LLM Evaluation Library

OSS Pipeline Library and UI

UI Centric Workflows

BrainTrust versus Arize Phoenix Versus Arize AX: Feature Comparison

Open source

✅

–

❌

1-command self-host

✅

❌

Free

✅

Free Tier

✅

❌

✅

❌

✅

❌

✅

❌

🔸 built-in

✅ advanced

❌

✅ full

❌

✅

(debuggable)

❌

✅

⚠️ limited

Coming Soon

✅

❌

✅

❌

AI-powered search & analytics

❌

✅

❌

✅

❌

✅

⚠️ SOC-2 only

HIPAA /

–

✅

❌

Key Differences

Complete Ownership vs. Vendor Lock-In

Phoenix:

100% open source
Free self-hosting forever - no feature gates, no restrictions
Deploy with a single Docker container - truly "batteries included"
Your data stays on your infrastructure from day one

Braintrust:

Proprietary closed-source platform
Self-hosting locked behind paid Enterprise tier (custom pricing)
Free tier severely limited: 14-day retention, 5 users max, 1GB storage
$249/month minimum for meaningful usage ($1.50 per 1,000 scores beyond limit)

Developer-First Experience

Phoenix:

Framework agnostic - works with LangChain, LlamaIndex, DSPy, custom agents, anything
Built on OpenTelemetry/OpenInference standard - no proprietary lock-in
Auto-instrumentation that just works across ecosystems
Deploy anywhere: Docker, Kubernetes, AWS, your laptop - your choice

Braintrust:

Platform-dependent approach
Requires learning their specific APIs and workflows
Limited deployment flexibility on free/Pro tiers
Forces you into their ecosystem and pricing model

Evaluation & Observability

Phoenix:

Unlimited evaluations - run as many as you need
Pre-built evaluators: hallucination detection, toxicity, relevance, Q&A correctness
Custom evaluators with code or natural language
Human annotation capabilities built-in
Real-time tracing with full visibility into LLM applications

Braintrust:

10,000 scores on free tier ($1.50 per 1,000 additional)
50,000 scores on Pro ($249/month) - can get expensive fast
Good evaluation features, but pay-per-use model creates cost anxiety
Enterprise features locked behind custom pricing

Self-Hosting — Ease & Cost

Phoenix deploys with one Docker command and is free/unlimited to run on-prem or in the cloud. Braintrust’s self-hosting is reserved for paid enterprise plans and uses a hybrid model: the control plane (UI, metadata DB) stays in Braintrust’s cloud while you run API and storage services (Brainstore) yourself, plus extra infra wiring (note: you still pay seat / eval / retention fees, with the free tier capped at 1M spans, 10K scores, 14 days retention).

Instrumentation & Agent Tracing

Phoenix ships OpenInference—an OTel-compatible auto-instrumentation layer that captures every prompt, tool call and agent step with sub-second latency. Braintrust has 5 instrumentation options supported versus Arize Ax & Phoenix who have 50+ instrumentations.

Arize AX and Phoenix are the leaders in agent tracing solutions. Brainstrust does not trace agents today. Braintrust accepts OTel spans but has no auto-instrumentors or semantic conventions; most teams embed an SDK or proxy into their code, adding dev effort and potential latency.

Evaluation (Offline & Online)

Phoenix offers built-in and custom evaluators, “golden” datasets, and high-scale evaluation scoring (millions/day) with sampling, logs and failure debugging. Braintrust’s UI is great for prompt trials but lacks benchmarking on labeled data and has weaker online-eval debugging.

The Phoenix Evaluation library is tested against public datasets and is community supported. It is an open source tried and tested library, with millions of downloads. It has been running in production for over two years by tens of thousands of top enterprise organizations.

Human-in-the-Loop

Phoenix and Arize AX include annotation queues that let reviewers label any trace or dataset and auto-recompute metrics. Braintrust lacks queues; “Review” mode is manual and disconnected from evals

Agent Evaluation

Phoenix and AX have released extensive Agent evaluation including path evaluations, convergence evaluations and session level evaluations. The investment in research, material and technology spans over a year of work from the Arize team. Arize is the leading company thinking and working on Agent evaluation.

Open Source vs. Proprietary

One of the most fundamental differences is Phoenix’s open-source nature versus Braintrust’s proprietary approach. Phoenix is fully open source, meaning teams can inspect the code, customize the platform, and self-host it on their own infrastructure without licensing fees. This openness provides transparency and control that many organizations value. In contrast, Braintrust is a closed-source platform, which limits users’ ability to customize or extend it.

Moreover, Phoenix is built on open standards like OpenTelemetry and OpenInference for trace instrumentation. From day one, Phoenix and Arize AX have embraced open standards and open standards, ensuring compatibility with a wide range of tools and preventing vendor lock-in. Braintrust relies on its own SDK/proxy approach for logging, and does not offer the same degree of open extensibility. Its proprietary design means that while it can be integrated into apps, it ties you into Braintrust’s way of operating (and can introduce an LLM proxy layer for logging that some teams see as a potential point of latency or risk).

Teams that prioritize transparency, community-driven development, and long-term flexibility often prefer an open solution like Phoenix.

How to Choose

Prototype & iterate fast? → Phoenix (open, free, unlimited instrumentation & evals).
Scale, governance, compliance? → Arize AX (also free to start, petabyte storage, 99.9 % SLA, HIPAA, RBAC, AI-powered analytics).

Langfuse alternative? Arize Phoenix vs Langfuse: key differences

What is the difference between Arize Phoenix and Langfuse?

While it is open-source, Langfuse locks certain key features like Prompt Playground and LLM-as-a-Judge evals behind a paywall. These same features are free in Phoenix.
Phoenix is significantly easier to self-host than Langfuse. Langfuse requires you to separately setup and link Clickhouse, Redis, and S3. Phoenix can be hosted out-of-the-box as a single docker container.
Langfuse relies on outside instrumentation libraries to generate traces. Arize maintains its own layer that operates in concert with OpenTelemetry for instrumentation.
Phoenix is backed by Arize AI. Phoenix users always have the option to graduate into Arize AX, with additional features, a customer success org, infosec team, and dedicated support. Meanwhile, Phoenix is able to focus entirely on providing the best fully open-source solution in the ecosystem.

Feature Access

Langfuse is open-source, but several critical features are gated behind its paid offering when self-hosting. For example:

Prompt Playground
LLM-as-a-Judge evaluations
Prompt experiments
Annotation queues

These features can be crucial for building and refining LLM systems, especially in early prototyping stages. In contrast, Arize Phoenix offers these capabilities fully open-source.

Ease of Self-Hosting

Self-hosting Langfuse requires setting up and maintaining:

A ClickHouse database for analytics
Redis for caching and background jobs
S3-compatible storage for logs and artifacts

Instrumentation Approach

Langfuse does not provide its own instrumentation layer—instead, it relies on developers to integrate third-party libraries to generate and send trace data.

Phoenix takes a different approach: it includes and maintains its own OpenTelemetry-compatible instrumentation layer, OpenInference.

In fact, Langfuse supports OpenInference tracing as one of its options. This means that using Langfuse requires at least one additional dependency on an instrumentation provider.

Backed by Arize AI

Phoenix is backed by Arize AI, the leading and best-funded AI Observability provider in the ecosystem.

The success of Arize means that Phoenix does not need to be heavily commercialized. It can focus entirely on providing the best open-source solution for LLM Observability & Evaluation.

Feature Comparison

Feature

Arize Phoenix

Arize AX

Langfuse

Open Source

✅

Tracing

✅

Auto-Instrumentation

✅

Offline Evals

✅

Online Evals

✅

Experimentation

✅

Prompt Management

✅

Prompt Playground

✅

Run Prompts on Datasets

✅

Built-in Evaluators

✅

Agent Evaluations

✅

Human Annotations

✅

Custom Dashboards

✅

Workspaces

✅

Semantic Querying

✅

Copilot Assistant

✅

Final Thoughts

Braintrust Open Source Alternative? LLM Evaluation Platform Comparison

Dive into the difference between Braintrust and Phoenix open source LLM evaluation and tracing

Why is Arize Phoenix a popular open source alternative to Braintrust?

Top Differences (TL;DR)

Open source

OSS

Closed source

Single Docker

Enterprise-only hybrid

LLM Evaluation Library

OSS Pipeline Library and UI

UI Centric Workflows

BrainTrust versus Arize Phoenix Versus Arize AX: Feature Comparison

Open source

✅

–

❌

1-command self-host

✅

❌

Free

✅

Free Tier

✅

❌

✅

❌

✅

❌

✅

❌

🔸 built-in

✅ advanced

❌

✅ full

❌

✅

(debuggable)

❌

✅

⚠️ limited

Coming Soon

✅

❌

✅

❌

AI-powered search & analytics

❌

✅

❌

✅

❌

✅

⚠️ SOC-2 only

HIPAA /

–

✅

❌

Key Differences

Complete Ownership vs. Vendor Lock-In

Phoenix:

100% open source
Free self-hosting forever - no feature gates, no restrictions
Deploy with a single Docker container - truly "batteries included"
Your data stays on your infrastructure from day one

Braintrust:

Proprietary closed-source platform
Self-hosting locked behind paid Enterprise tier (custom pricing)
Free tier severely limited: 14-day retention, 5 users max, 1GB storage
$249/month minimum for meaningful usage ($1.50 per 1,000 scores beyond limit)

Developer-First Experience

Phoenix:

Framework agnostic - works with LangChain, LlamaIndex, DSPy, custom agents, anything
Built on OpenTelemetry/OpenInference standard - no proprietary lock-in
Auto-instrumentation that just works across ecosystems
Deploy anywhere: Docker, Kubernetes, AWS, your laptop - your choice

Braintrust:

Platform-dependent approach
Requires learning their specific APIs and workflows
Limited deployment flexibility on free/Pro tiers
Forces you into their ecosystem and pricing model

Evaluation & Observability

Phoenix:

Unlimited evaluations - run as many as you need
Pre-built evaluators: hallucination detection, toxicity, relevance, Q&A correctness
Custom evaluators with code or natural language
Human annotation capabilities built-in
Real-time tracing with full visibility into LLM applications

Braintrust:

10,000 scores on free tier ($1.50 per 1,000 additional)
50,000 scores on Pro ($249/month) - can get expensive fast
Good evaluation features, but pay-per-use model creates cost anxiety
Enterprise features locked behind custom pricing

Self-Hosting — Ease & Cost

Instrumentation & Agent Tracing

Evaluation (Offline & Online)

Human-in-the-Loop

Agent Evaluation

Open Source vs. Proprietary

Teams that prioritize transparency, community-driven development, and long-term flexibility often prefer an open solution like Phoenix.

How to Choose

Prototype & iterate fast? → Phoenix (open, free, unlimited instrumentation & evals).
Scale, governance, compliance? → Arize AX (also free to start, petabyte storage, 99.9 % SLA, HIPAA, RBAC, AI-powered analytics).