Arize Phoenix:
Open-source LLM tracing & evaluation

Evaluate, experiment, and optimize AI applications in real time.

Get started with Phoenix Self host

2.5M+

Downloads

6.4k+

Github Stars

1.5M+

OTEL Instrumentation Monthly Downloads

6k+

Community

AI Developers Use Phoenix

Flexible, Transparent, & Free from Lock-In

Phoenix uses OpenTelemetry (OTEL) for seamless setup, full transparency, and no vendor lock-in—start, scale, or move on without restriction.

Application Tracing for Total Visibility

Easily collect LLM app data with automatic instrumentation — or go manual for  utmost control.

Read about App Tracing

Interactive Prompt Playground

A fast, flexible sandbox for prompt and model iteration. Compare prompts, visualize outputs, and debug failures without leaving your workflow.

Read about Prompt Playgrounds

Streamlined Evaluations and Annotations

Leverage an eval library with unparalleled speed and ergonomics. Start with pre-built templates, easily customized to any task, or incorporate Human feedback.

Read about Evals & Annotations

Dataset Clustering & Visualization

Uncover semantically similar questions, document chunks, and responses using embeddings to isolate poor performance.

Proudly Open Source

Built on top of OpenTelemetry, Phoenix is agnostic of vendor, framework, and language – granting you the flexibility you need in today’s generative landscape.

Arize-ai/phoenix ...

Phoenix works with all of your LLM tools

Run model tests, leverage pre-built templates, and seamlessly incorporate human feedback. Customize to fit any project while fine-tuning faster than ever.

Don’t just take our word for it

"We are in an exciting time for AI technology including LLMs. We will need better tools to understand and monitor an LLM’s decision making. With Phoenix, Arize is offering an open source way to do exactly just that in a nifty library."

Erick Siavichay

Project Mentor, Inspirit AI

"Just came across Arize-phoenix, a new library for LLMs and RNs that provides visual clustering and model interpretability. Super useful."

Lior Sinclair

AI Researcher

"I always get asked what my favorite Open Source eval tool is. The most impressive one thus far to me is Phoenix."

Hamel Husain

Founder, Parlance Labs

"As LLM-powered applications increase in sophistication and new use cases emerge, deeper capabilities around LLM observability are needed to help debug and troubleshoot. We’re pleased to see this open-source solution from Arize, along with a one-click integration to LlamaIndex, and recommend any AI engineers or developers building with LlamaIndex check it out."

Jerry Liu

CEO and Co-Founder, LlamaIndex

"This is something that I was wanting to build at some point in the future, so I’m really happy to not have to build it. This is amazing."

Tom Matthews

Machine Learning Engineer at Unitary.ai

"Arize's Phoenix tool uses one LLM to evaluate another for relevance, toxicity, and quality of responses. The tool uses 'Traces' to record the paths taken by LLM requests (made by an application or end user) as they propagate through multiple steps. An accompanying OpenInference specification uses telemetry data to understand the execution of LLMs and the surrounding application context. In short, it's possible to figure out where an LLM workflow broke or troubleshoot problems related to retrieval and tool execution."

Lucas Mearian

Senior Reporter, ComputerWorld

"Large language models...remain susceptible to hallucination — in other words, producing false or misleading results. Phoenix, announced today at Arize AI’s Observe 2023 summit, targets this exact problem by visualizing complex LLM decision-making and flagging when and where models fail, go wrong, give poor responses or incorrectly generalize."

Shubham Sharma

VentureBeat

"Phoenix is a much-appreciated advancement in model observability and production. The integration of observability utilities directly into the development process not only saves time but encourages model development and production teams to actively think about model use and ongoing improvements before releasing to production. This is a big win for management of the model lifecycle."

Christopher Brown

CEO and Co-Founder of Decision Patterns and a former UC Berkeley Computer Science lecturer

"Phoenix integrated into our team’s existing data science workflows and enabled the exploration of unstructured text data to identify root causes of unexpected user inputs, problematic LLM responses, and gaps in our knowledge base."

Yuki Waka

Application Developer, Klick

Arize AX

Learn

Insights

Company

Arize AX

Learn

Insights

Company

Arize Phoenix:
Open-source LLM tracing & evaluation

Evaluate, experiment, and optimize AI applications in real time.

2.5M+

6.4k+

1.5M+

6k+

AI Developers Use Phoenix

Flexible, Transparent, & Free from Lock-In

Application Tracing for Total Visibility

Interactive Prompt Playground

Streamlined Evaluations and Annotations

Dataset Clustering & Visualization

Proudly Open Source

Phoenix works with all of your LLM tools

Don’t just take our word for it

Erick Siavichay

Lior Sinclair

Hamel Husain

Jerry Liu

Tom Matthews

Lucas Mearian

Shubham Sharma

Christopher Brown

Yuki Waka

Arize AX

Learn

Insights

Company

Arize Phoenix: Open-source LLM tracing & evaluation

Evaluate, experiment, and optimize AI applications in real time.

2.5M+

6.4k+

1.5M+

6k+

AI Developers Use Phoenix

Flexible, Transparent, & Free from Lock-In

Application Tracing for Total Visibility

Interactive Prompt Playground

Streamlined Evaluations and Annotations

Dataset Clustering & Visualization

Proudly Open Source

Phoenix works with all of your LLM tools

Don’t just take our word for it

Erick Siavichay

Lior Sinclair

Hamel Husain

Jerry Liu

Tom Matthews

Lucas Mearian

Shubham Sharma

Christopher Brown

Yuki Waka

Subscribe to The Evaluator

Arize Phoenix:
Open-source LLM tracing & evaluation