Milestone · June 2026

Phoenix at0

Phoenix is open-source observability and evaluation built by AI engineers for AI engineers. Runs anywhere. Keeps your data yours. Ship on evidence, not vibes.

10K+

GitHub stars

2.4M+

Monthly Phoenix installs

6M+

Monthly OpenInference downloads

32M+

All-time downloads

Read the story of Phoenix Learn more about Phoenix

The rise of Phoenix

Mikyo, Xander , and Roger spent three years building the foundations of AI observability. In this interview with RL Nabors, they recount pivoting on GPT-3, betting on OpenTelemetry, and keeping all of it open.

Read the interview on the blog

The state of AI observability

Five frames from the field, as of June 2026. Each one is a single insight. Read it like an instrument panel, not an infographic.

5.6M

The OpenInference semantic conventions and instrumentation packages get pulled 5.6M times a month: the spec definition itself, before any framework integration. The standard travels further than the product that ships it.

OpenInference base downloads · monthly

Framework agnostic means more frameworks.

Phoenix ships a dedicated integration for 26+ Python frameworks and 11+ TypeScript ones. The LangChain integration leads at 1.91M monthly installs, but the agno, google-adk, anthropic, llama-index, haystack, pydantic-ai, instructor, crewai and dspy integrations are active and rising.

// Switzerland by construction: no vendor on this chart depends on Arize.

Phoenix integration installs · monthly, by framework

All languages welcome.

AI engineering isn't one language anymore. Phoenix's codebase is 47.1% Python and 37.7% TypeScript. Both have their own SDK, evals package, CLI, and MCP integration. Phoenix is built for how you work.

Phoenix codebase composition

Python

TypeScript

Other

Python 47.1% TypeScript 37.7% Other 15.2%

SDK · evals · CLI · MCP: mirrored across both ecosystems.

Harnesses and runtimes, not wrappers.

Phoenix's fastest-growing integrations, filtered to ≥5K monthly downloads so the percentages shows MCP, Bedrock Agent Runtime, and Pydantic-AI adoption doubling, even tripling.

// 2026 is the year of harnesses and runtimes.

Month-over-month growth · top risers

Ready steady growth.

Phoenix's star history from its launch in 2023 to today shows steady growth with acceleration at each agentic coding breakthrough. This is the shape of a project that's compounding rather than spiking.

GitHub stars · 2023 → 2026

Taking flight

How a skunkworks project changed the DevOps community.

Late 2022

The decision to go open

Arize had been a closed-source company since 2020. A small team is tasked to change that.

Early 2023

A Jupyter notebook extension

ML engineers lived in notebooks, so Phoenix met them there. The first version was a notebook extension for visualizing the structure of embeddings, classifications, and ranking models.

March 2023

The GPT-3 pivot

Two weeks before launch at Observe, GPT-3 changed everything. Phoenix switched from visualizing classifier embeddings to visualizing the embeddings of every question going into an LLM. Roger got UMAP and HDBSCAN running, and the first Phoenix point cloud showed the inner workings of a chatbot.

June 2023

The Woodstock of AI

Xander demoed the embedding visualizer in front of 5,000 engineers at an AI gathering organized by Hugging Face on a dock in San Francisco. The demo connected the team to LlamaIndex, who needed observability and didn't have time to build it.

Fall 2023

The first tracer

Roger built Phoenix's tracing solution as a design partner for LlamaIndex. It wasn't OpenTelemetry yet. It looked like OTel, but the team was unsure AI data was a fit.

January 2024

The reluctant container

Users were running Phoenix as long-lived Python processes with millions of in-memory traces, then hacking their own persistence to MongoDB or Elasticsearch. The team bought a Docker Hub name and shipped Phoenix in a container.

Early 2024

The OpenTelemetry bet

A GitHub issue from a LangChain-for-Go maintainer. A Ruby developer at a hackathon who couldn't use Phoenix without rewriting his stack. The lesson landed: to help the most engineers, use the plumbing they already have. Roger's PR to switch to OTel got revived, and OpenInference emerged as a spec anyone could implement against any backend.

2024

Building backwards

Phoenix was built database-first: SQLite for local, then Postgres for distributed. Users who needed Keycloak, Cognito, and basic auth contributed OIDC. The platform took shape from the outside in, driven by what the community needed.

2024–2025

The framework wave

LangChain. LlamaIndex. Bedrock. Anthropic. Google ADK. CrewAI. Pydantic-AI. Instructor. DSPy. Vercel AI SDK. Google GenAI. Mastra. Claude Agent SDK. The list grew to 26+ Python integrations and 11+ TypeScript. IBM, Hugging Face, and Agno trusted OpenInference enough to contribute to it.

June 2026 · 10,000 stars

We are here.

Ten thousand stars. Six engineers. A standard the whole field can build on, and a lot of road still ahead.

What six engineers built since 2023

Local-first—yeah, we meant to do that...

Runs anywhere, on anything. Your laptop. A Docker container. Air-gapped in a corporate environment with no internet. 400 engineers can each run the same observability stack at no extra cost. True story.

Completely open, NSA.

Every feature is in the open-source version. No "open core." No enterprise tier holding back the good stuff. The version on your laptop is the version that runs at scale.

Three eval scopes, not one.

Phoenix evaluates at the span, the trace, and the session. Evaluate if a retrieval surfaced the right document, an agent achieved its goal, if the user's session actually solved their problem.

OpenInference: the standard you trust.

IBM, Hugging Face, Agno, and the gateway players all build on OpenInference. The spec is bigger than the product that ships it, and it stays portable across any backend you choose.

The Switzerland of evals.

Phoenix works with everything you work with: 26+ Python frameworks and 11+ TypeScript ones: LangChain, LlamaIndex, Bedrock, ADK, Anthropic, CrewAI, Agno, Pydantic-AI, Claude Agent SDK, Vercel, Google GenAI, Mastra... and counting.

The future of Phoenix

Built for what happens next.

The next stretch of AI engineering looks different from the last one. Now that "code is free", the bottleneck has moved to review and QA.

Agents with their own observability.

Every coding agent should have its own sandbox to gut-check changes against: its own traces, evals, and SQLite. Arize already runs this internally with git worktrees and parallel Phoenix instances.

Human and agent collaboration.

Sometimes humans ask agents to make changes and approve what comes back. Sometimes expert agents surface insights and humans decide. Phoenix is the surface where those exchanges happen, with permissions, audit trails, and the right hooks built in.

Evidence-based development.

Vibes scaled fine when one engineer iterated with one model. They don't scale when agents ship changes faster than humans can read them. The teams who keep shipping responsibly will treat evals as an accelerant, not a tax.

From the team:

Ship fast.Ship responsibly.

New to observability?

If you're figuring out what evals are, why anyone bothers with traces, or what "observability" means for a non-deterministic system, start here.

Start learning

Already shipping?

Your participation shapes Phoenix. Keep up with what's shipping, what we're learning, and where the project is going next.

Tell us what you think

Evaluating Phoenix?

For engineering leaders, platform teams, and anyone deciding whether to bet on Phoenix as part of their AI stack.

Check out the production guide