Phoenix at0
Phoenix is open-source observability and evaluation built by AI engineers for AI engineers. Runs anywhere. Keeps your data yours. Ship on evidence, not vibes.
The state of AI observability
Five frames from the field, as of June 2026. Each one is a single insight. Read it like an instrument panel, not an infographic.
The OpenInference semantic conventions and instrumentation packages get pulled 5.6M times a month: the spec definition itself, before any framework integration. The standard travels further than the product that ships it.
Framework agnostic means more frameworks.
Phoenix ships a dedicated integration for 26+ Python frameworks and 11+ TypeScript ones. The LangChain integration leads at 1.91M monthly installs, but the agno, google-adk, anthropic, llama-index, haystack, pydantic-ai, instructor, crewai and dspy integrations are active and rising.
All languages welcome.
AI engineering isn't one language anymore. Phoenix's codebase is 47.1% Python and 37.7% TypeScript. Both have their own SDK, evals package, CLI, and MCP integration. Phoenix is built for how you work.
Harnesses and runtimes, not wrappers.
Phoenix's fastest-growing integrations, filtered to ≥5K monthly downloads so the percentages shows MCP, Bedrock Agent Runtime, and Pydantic-AI adoption doubling, even tripling.
Ready steady growth.
Phoenix's star history from its launch in 2023 to today shows steady growth with acceleration at each agentic coding breakthrough. This is the shape of a project that's compounding rather than spiking.
How a skunkworks project changed the DevOps community.
The decision to go open
Arize had been a closed-source company since 2020. A small team is tasked to change that.
A Jupyter notebook extension
ML engineers lived in notebooks, so Phoenix met them there. The first version was a notebook extension for visualizing the structure of embeddings, classifications, and ranking models.
The GPT-3 pivot
Two weeks before launch at Observe, GPT-3 changed everything. Phoenix switched from visualizing classifier embeddings to visualizing the embeddings of every question going into an LLM. Roger got UMAP and HDBSCAN running, and the first Phoenix point cloud showed the inner workings of a chatbot.
The Woodstock of AI
Xander demoed the embedding visualizer in front of 5,000 engineers at an AI gathering organized by Hugging Face on a dock in San Francisco. The demo connected the team to LlamaIndex, who needed observability and didn't have time to build it.
The first tracer
Roger built Phoenix's tracing solution as a design partner for LlamaIndex. It wasn't OpenTelemetry yet. It looked like OTel, but the team was unsure AI data was a fit.
The reluctant container
Users were running Phoenix as long-lived Python processes with millions of in-memory traces, then hacking their own persistence to MongoDB or Elasticsearch. The team bought a Docker Hub name and shipped Phoenix in a container.
The OpenTelemetry bet
A GitHub issue from a LangChain-for-Go maintainer. A Ruby developer at a hackathon who couldn't use Phoenix without rewriting his stack. The lesson landed: to help the most engineers, use the plumbing they already have. Roger's PR to switch to OTel got revived, and OpenInference emerged as a spec anyone could implement against any backend.
Building backwards
Phoenix was built database-first: SQLite for local, then Postgres for distributed. Users who needed Keycloak, Cognito, and basic auth contributed OIDC. The platform took shape from the outside in, driven by what the community needed.
The framework wave
LangChain. LlamaIndex. Bedrock. Anthropic. Google ADK. CrewAI. Pydantic-AI. Instructor. DSPy. Vercel AI SDK. Google GenAI. Mastra. Claude Agent SDK. The list grew to 26+ Python integrations and 11+ TypeScript. IBM, Hugging Face, and Agno trusted OpenInference enough to contribute to it.
We are here.
Ten thousand stars. Six engineers. A standard the whole field can build on, and a lot of road still ahead.
What six engineers built since 2023
Local-first—yeah, we meant to do that...
Runs anywhere, on anything. Your laptop. A Docker container. Air-gapped in a corporate environment with no internet. 400 engineers can each run the same observability stack at no extra cost. True story.
Completely open, NSA.
Every feature is in the open-source version. No "open core." No enterprise tier holding back the good stuff. The version on your laptop is the version that runs at scale.
Three eval scopes, not one.
Phoenix evaluates at the span, the trace, and the session. Evaluate if a retrieval surfaced the right document, an agent achieved its goal, if the user's session actually solved their problem.
OpenInference: the standard you trust.
IBM, Hugging Face, Agno, and the gateway players all build on OpenInference. The spec is bigger than the product that ships it, and it stays portable across any backend you choose.
The Switzerland of evals.
Phoenix works with everything you work with: 26+ Python frameworks and 11+ TypeScript ones: LangChain, LlamaIndex, Bedrock, ADK, Anthropic, CrewAI, Agno, Pydantic-AI, Claude Agent SDK, Vercel, Google GenAI, Mastra... and counting.
Built for what happens next.
The next stretch of AI engineering looks different from the last one. Now that "code is free", the bottleneck has moved to review and QA.
Agents with their own observability.
Every coding agent should have its own sandbox to gut-check changes against: its own traces, evals, and SQLite. Arize already runs this internally with git worktrees and parallel Phoenix instances.
Human and agent collaboration.
Sometimes humans ask agents to make changes and approve what comes back. Sometimes expert agents surface insights and humans decide. Phoenix is the surface where those exchanges happen, with permissions, audit trails, and the right hooks built in.
Evidence-based development.
Vibes scaled fine when one engineer iterated with one model. They don't scale when agents ship changes faster than humans can read them. The teams who keep shipping responsibly will treat evals as an accelerant, not a tax.
Ship fast.Ship responsibly.
New to observability?
If you're figuring out what evals are, why anyone bothers with traces, or what "observability" means for a non-deterministic system, start here.
- The Phoenix quickstarts
- Read the Evaluator
- Try out the Phoenix demo
- Learn about evaluating LLMs outputs
Already shipping?
Your participation shapes Phoenix. Keep up with what's shipping, what we're learning, and where the project is going next.
- Join the Slack community
- Join the GitHub Discussions
- Become an ambassador
- Check the Release notes
- Star Phoenix on GitHub
Evaluating Phoenix?
For engineering leaders, platform teams, and anyone deciding whether to bet on Phoenix as part of their AI stack.
- How open-source Phoenix stacks up to Langsmith and alternatives
- How to migrate from Arize Phoenix to Arize AX
- OpenInference: the spec, and why it matters
- Using Phoenix in a production app to ship a small model that matches LLM performance