Arize Phoenix:
Open-source LLM tracing & evaluation

Evaluate, experiment, and optimize AI applications in real time.

2.5M+

Downloads

6.4k+

Github Stars

1.5M+

OTEL Instrumentation Monthly Downloads

6k+

Community

AI Developers Use Phoenix

Partner logo #0 Partner logo #1 Partner logo #2
Partner logo #0 Partner logo #1 Partner logo #2

Flexible, Transparent, & Free from Lock-In

Phoenix uses OpenTelemetry (OTEL) for seamless setup, full transparency, and no vendor lock-in—start, scale, or move on without restriction.

Application Tracing for Total Visibility

Easily collect LLM app data with automatic instrumentation — or go manual for 
utmost control.

Interactive Prompt Playground

A fast, flexible sandbox for prompt and model iteration. Compare prompts, visualize outputs, and debug failures without leaving your workflow.

Streamlined Evaluations and Annotations

Leverage an eval library with unparalleled speed and ergonomics. Start with pre-built templates, easily customized to any task, or incorporate Human feedback.

Dataset Clustering & Visualization

Uncover semantically similar questions, document chunks, and responses using embeddings to isolate poor performance.

Proudly Open Source

Built on top of OpenTelemetry, Phoenix is agnostic of vendor, framework, and language – granting you the flexibility you need in today’s generative landscape.

Phoenix works with all of your LLM tools

Run model tests, leverage pre-built templates, and seamlessly incorporate human feedback. Customize to fit any project while fine-tuning faster than ever.

Partner logo #0 Partner logo #1 Partner logo #2 Partner logo #3 Partner logo #4 Partner logo #5 Partner logo #6 Partner logo #7
Partner logo #0 Partner logo #1 Partner logo #2 Partner logo #3 Partner logo #4 Partner logo #5 Partner logo #6 Partner logo #7

Don’t just take our word for it

Quote #0 logo
"We are in an exciting time for AI technology including LLMs. We will need better tools to understand and monitor an LLM’s decision making. With Phoenix, Arize is offering an open source way to do exactly just that in a nifty library."
Erick Siavichay's photo

Erick Siavichay

Project Mentor, Inspirit AI

"Just came across Arize-phoenix, a new library for LLMs and RNs that provides visual clustering and model interpretability. Super useful."
Lior Sinclair's photo

Lior Sinclair

AI Researcher

Quote #0 logo
"I always get asked what my favorite Open Source eval tool is. The most impressive one thus far to me is Phoenix."
Hamel Husain's photo

Hamel Husain

Founder, Parlance Labs

Quote #0 logo
"As LLM-powered applications increase in sophistication and new use cases emerge, deeper capabilities around LLM observability are needed to help debug and troubleshoot. We’re pleased to see this open-source solution from Arize, along with a one-click integration to LlamaIndex, and recommend any AI engineers or developers building with LlamaIndex check it out."
Jerry Liu's photo

Jerry Liu

CEO and Co-Founder, LlamaIndex

Quote #0 logo
"This is something that I was wanting to build at some point in the future, so I’m really happy to not have to build it. This is amazing."
Tom Matthews's photo

Tom Matthews

Machine Learning Engineer at Unitary.ai

Quote #0 logo
"Arize's Phoenix tool uses one LLM to evaluate another for relevance, toxicity, and quality of responses. The tool uses 'Traces' to record the paths taken by LLM requests (made by an application or end user) as they propagate through multiple steps. An accompanying OpenInference specification uses telemetry data to understand the execution of LLMs and the surrounding application context. In short, it's possible to figure out where an LLM workflow broke or troubleshoot problems related to retrieval and tool execution."
Lucas Mearian's photo

Lucas Mearian

Senior Reporter, ComputerWorld

Quote #0 logo
"Large language models...remain susceptible to hallucination — in other words, producing false or misleading results. Phoenix, announced today at Arize AI’s Observe 2023 summit, targets this exact problem by visualizing complex LLM decision-making and flagging when and where models fail, go wrong, give poor responses or incorrectly generalize."
Shubham Sharma's photo

Shubham Sharma

VentureBeat

Quote #0 logo
"Phoenix is a much-appreciated advancement in model observability and production. The integration of observability utilities directly into the development process not only saves time but encourages model development and production teams to actively think about model use and ongoing improvements before releasing to production. This is a big win for management of the model lifecycle."
Christopher Brown's photo

Christopher Brown

CEO and Co-Founder of Decision Patterns and a former UC Berkeley Computer Science lecturer

Quote #0 logo
"Phoenix integrated into our team’s existing data science workflows and enabled the exploration of unstructured text data to identify root causes of unexpected user inputs, problematic LLM responses, and gaps in our knowledge base."
Yuki Waka's photo

Yuki Waka

Application Developer, Klick