Phoenix just crossed 10,000 GitHub stars.
For an open-source project, that milestone means thousands of developers have decided a repository is worth watching, testing, contributing to, or betting on. Some filed issues. Some opened pull requests. Others showed up in Slack, challenged assumptions, and helped shape the roadmap.
Phoenix has always been what Nadia Eghbal defined as a “stadium project” in her book, Working in Public: The Making and Maintenance of Open Source Software. A stadium project is when a handful of engineers build something thousands of developers depend on. .
Phoenix started as a Jupyter notebook extension in 2023 and has grown into one of the most widely adopted open-source projects in AI observability. Along the way, it helped shape OpenInference, adopted OpenTelemetry before it became the default choice for much of the AI ecosystem, and expanded from a notebook tool into a platform used across frameworks, languages, and deployment environments.
The story of Phoenix is also a story about the emergence of AI engineering as a discipline. As frameworks evolved, observability standards emerged, and agent workflows became commonplace, the project evolved alongside them.
Many of the decisions that shaped Phoenix came back to the same question:
“How do we reach the most users?” said Mikyo King, Head of Open-source at Arize.
Hear the story behind Phoenix from the people who built it
This article is based on a conversation with Phoenix maintainers Mikyo King, Roger Yang, and Xander Song.
Watch the full interview below or read on to learn the story behind Phoenix’s move to OpenTelemetry, the creation of OpenInference, and why the team deliberately built the project backward. You can also explore the latest data from Phoenix as we celebrate hitting 10,000 stars.
Building AI observability inside Jupyter Notebooks
Arize started in 2020 as a closed-source company. By late 2022, the team wanted to reach developers in a different way.
“We wanted to reach a lot more users, not just thousands of our customers, but millions of people,” Mikyo said. Developer communities distrust vendors, often turning to to open source projects like Keras and PyTorch first, Big Tech offerings second. And data privacy was a growing concern as AI services began passing private information through increasingly difficult to trace networks.
Mikyo was asked to spearhead open source, without a precise definition of what that meant. The one thing the team knew was where its users lived. “A lot of our community was living and breathing in [Jupyter] notebooks,” Mikyo said. “We really wanted to meet our developers where they were.” Thus, the first version of Phoenix emerged as a humble Jupyter notebook extension was the answer.
The earliest versions focused on visualizing embeddings and unstructured data. Then GPT-3 changed the direction of the project. The team began visualizing the questions flowing through LLM applications. Arize Software Engineer Roger Yang built visualizations using UMAP and HDBSCAN to help engineers identify clusters of related prompts and responses. For the first time, Phoenix could reveal the structure of an AI application in a way developers could inspect directly.
That work eventually connected the team with LlamaIndex and the broader ecosystem of developers building retrieval-augmented generation (RAG) systems.
For Roger, who had come from Go and backend work, the project became as much a learning experience as an engineering challenge.
“One major advantage of open source is your ability to read other people’s code and learn new things,” he said. “Python was new to us as well.”
Phoenix was growing alongside the ecosystem it served.
How Phoenix evolved from a Notebook tool to an AI observability platform
Most software projects begin with infrastructure, but Phoenix began with utility.
The team focused on helping developers understand what was happening inside AI applications. Infrastructure followed later.
“We built features, then we built a container, then we built a database layer, and then we built authentication,” Mikyo said. “Quite literally backwards of what you would imagine a software project going.”
As developers adopted Phoenix, new requirements emerged.
Users ran long-lived Phoenix instances that accumulated millions of traces. Some built their own persistence layers using Elasticsearch or MongoDB, while others asked for ways to move beyond notebook environments.
“We knew people needed to escape the notebook,” Mikyo said. The team responded by containerizing Phoenix before building a database layer.
“We containerized it first, which sounds crazy.”
SQLite followed, and Postgres support followed after that. Authentication arrived later, driven largely by community requests. Some developers needed Keycloak, while others needed Cognito, and others wanted OIDC support.
“Each developer kind of builds a muscle of being a great developer, but also advocating for their own software,” said Mikyo.
The relationship between maintainers and users remains unusually direct. Case in point: a big part of what made that feedback loop tight was (and is) the lack of a separate support org behind Phoenix.
“We don’t really have a support team,” says Xander Song. “We are the support team.”
“When you are the support team for your own product, there’s a certain level of trying to deliver a very high bar of quality, and feeling accountable when people come into GitHub and tell you this is not working.”
Why Phoenix adopted OpenTelemetry and created OpenInference
The move to OpenTelemetry was one that the team debated internally before committing to it. At first, the team was not convinced it would work.
By late 2023, Phoenix had tracing, but it only worked with Phoenix. The team had built something that looked like OpenTelemetry without committing to it, partly out of doubt that AI data even fit the model. Phoenix addressed conversations, embeddings, retrieval results, and model outputs, whereas traditional observability systems were designed around infrastructure signals and application events.
“It wasn’t obvious that OTel was the right vehicle,” Roger said.
The team debated the decision internally.
Roger even submitted a pull request to switch Phoenix to OpenTelemetry before consensus existed.
Mikyo pushed back. “I think we’re kind of pushing a square peg through a circular hole,” he remembered thinking.
What changed his mind was a mix of community pressure and evidence. GitHub issues kept asking for the switch. The team knew distributed tracing was coming, since agents would soon call LLMs across services rather than inside a single notebook. And the users they met in person made the case directly. “Someone would say, ‘I’m the maintainer of LangChain for Go.’ Or, ‘I have an existing Ruby application, I really want to use Phoenix, but I’m not going to switch to Python anytime soon,’” Mikyo says.
The question that had shaped Phoenix from the beginning resurfaced.
“How do we reach the most users? Why not use the right plumbing, the plumbing that already existed in DevOps?” Mikyo asks.
In the end, the team came around and made the switch. And a byproduct of that switch was the development of OpenInference: a set of semantic conventions for AI applications that anyone can implement against any backend. The team kept the spec and the instrumentation in one monorepo so they could move fast, “developed by practitioners,” as Mikyo put it, on the two-week cadence that AI startups tend to move at. The first pull request from Hugging Face was a milestone, and Xander put in a lot of work maintaining it over the years.
In hindsight, Mikyo thought the doubt was misplaced. “DevOps problems are also AI ops problems. ”
Why Phoenix stayed local first
One of Phoenix’s defining characteristics emerged almost by accident. Phoenix runs completely locally.
“It’s a happy accident,” Mikyo says. “Because it started as a Jupyter extension, we just were never building a SaaS platform to begin with. We always had to assume everything had to run locally.”
The benefits became increasingly obvious:
- Developers could debug issues locally.
- Teams working with sensitive data could keep telemetry inside their own environments.
- Organizations operating in air-gapped environments could still run observability tooling.
Moreover, a number of support questions came from Windows users in corporate roles, who were tracking sensitive data they could not send anywhere. Mikyo remembered a user named Rusty who was so into local development that it surprised the team, with some people pushing local SQLite instances to 200 gigabytes. He pointed to companies where 400 engineers could each run the same observability stack on their own machines without incurring additional cost.
The local-first approach also aligned with the way many developers preferred to work.
When Llama.cpp made it possible to run 70-billion-parameter Qwen model on a laptop, local-first stopped looking like a constraint and started to look like a feature. “I can code while I’m on a plane,” Mikyo said.
What started as a design constraint became one of Phoenix’s core strengths.
Open source AI evaluation without vendor lock-in
Phoenix works across Python and TypeScript. It supports dozens of frameworks and integrates with observability backends throughout the ecosystem (the team calls this being the Switzerland of evals).
That openness came from humility as much as conviction.
“We didn’t really know what good evals look like,” Mikyo said. “We didn’t want to say, ‘Tthis is what good evals look like,’ because there was a lot we didn’t know. We just wanted people to be experimenting.”
Rather than prescribe a single approach, the team focused on helping developers observe, evaluate, and improve AI systems regardless of their programming language.
Keeping everything open was the way to learn from the community rather than dictate to it. Mikyo pointed to John Carmack shipping Quake with its own flavor of C so people could hack on it.
“We want people to be able to build agents that work,” Mikyo said. “That’s what we have a vested interest in.”
What’s next for Phoenix
Ask the team about the future and the conversation quickly turns to agents.
The way developers work has changed fast.
“I don’t hand-write code anymore, which is kind of nuts,” Mikyo said. “I tell Claude what to do at this point.”
The team believes the next generation of observability and evaluation tooling will need to directly support agent workflows.
That includes observability for coding agents, evaluation systems for agent-generated changes, and workflows that help humans review increasingly autonomous software systems. Today, Phoenix has to meet developers where they’re working with agents, the same way it once met them in notebooks.
One direction is giving each coding agent its own sandbox observability environment to gut-check its changes. Some of the team already run git worktrees with multiple Phoenix instances doing exactly that. Another is more human-in-the-loop flows, where humans ask agents to make changes and then approve them, or expert agents surface insights for humans to act on, with the right permissioning built in and the systems kept auditable.
At the same time, the philosophy behind Phoenix remains unchanged.
“Ship fast but responsibly is kind of our motto,” Mikyo said. “We’re definitely trying to build a system that helps you move faster but also responsibly.”
But the team is wary of the easy path. “If you take the easiest path, you might be producing more slop,” Mikyo said. His view is that evidence-based development, review, and automation matter more now that agents are writing more of the code.
Thank you to the contributors
Phoenix got here because people outside the core team kept showing up. The OIDC authentication came from users who needed Keycloak and Cognito.
Framework maintainers pushed the team toward OpenTelemetry, while contributors expanded integrations and helped shape OpenInference.
Bug reports, feature requests, and design discussions influenced what got built and when.
The feedback loop worked because the maintainers stayed close to the community.
“The reason this team lives and breathes on Slack is because we miss the old days of the IRC channels where you could talk to us,” Mikyo said. “We love nerding out about cool stuff.”
To everyone who filed an issue, opened a pull request, joined a discussion, shared feedback, or helped another developer in the community: thank you.
You helped shape Phoenix. We’re working hard to earn the next 10,000 stars.
