In order to improve your LLM application iteratively, it's vital to collect feedback, annotate data during human review, as well as to establish an evaluation pipeline so that you can monitor your application. In Phoenix we capture this type of feedback in the form of annotations.
Phoenix gives you the ability to annotate traces with feedback from the UI, your application, or wherever you would like to perform evaluation. Phoenix's annotation model is simple yet powerful - given an entity such as a span that is collected, you can assign a label
and/or a score
to that entity.
Learn more about the concepts:
Configure Annotation Configs to guide human annotations.
How to run Running Evals on Traces
Learn how to log annotations via the client from your app or in a notebook
Use projects to organize your LLM traces
Projects provide organizational structure for your AI applications, allowing you to logically separate your observability data. This separation is essential for maintaining clarity and focus.
With Projects, you can:
Segregate traces by environment (development, staging, production)
Isolate different applications or use cases
Track separate experiments without cross-contamination
Maintain dedicated evaluation spaces for specific initiatives
Create team-specific workspaces for collaborative analysis
Projects act as containers that keep related traces and conversations together while preventing them from interfering with unrelated work. This organization becomes increasingly valuable as you scale - allowing you to easily switch between contexts without losing your place or mixing data.
The Project structure also enables comparative analysis across different implementations, models, or time periods. You can run parallel versions of your application in separate projects, then analyze the differences to identify improvements or regressions.
Tracing is a critical part of AI Observability and should be used both in production and development
Phoenix's tracing and span analysis capabilities are invaluable during the prototyping and debugging stages. By instrumenting application code with Phoenix, teams gain detailed insights into the execution flow, making it easier to identify and resolve issues. Developers can drill down into specific spans, analyze performance metrics, and access relevant logs and metadata to streamline debugging efforts.
This section contains details on Tracing features:
Track and analyze multi-turn conversations
Sessions enable tracking and organizing related traces across multi-turn conversations with your AI application. When building conversational AI, maintaining context between interactions is critical - Sessions make this possible from an observability perspective.
With Sessions in Phoenix, you can:
Track the entire history of a conversation in a single thread
View conversations in a chatbot-like UI showing inputs and outputs of each turn
Search through sessions to find specific interactions
Track token usage and latency per conversation
This feature is particularly valuable for applications where context builds over time, like chatbots, virtual assistants, or any other multi-turn interaction. By tagging spans with a consistent session ID, you create a connected view that reveals how your application performs across an entire user journey.
Check out how to Setup Sessions