Large Language Models

AB Testing for LLM Applications

Phoenix uses Projects to group LLM traces. A Project can be considered a collection of traces and a container for traces related to a single application or service. You can also have multiple Projects with multiple traces. Some use cases include separating testing and production or even looking at two different applications. In this example, we dive into LLM application AB testing using Projects in Phoenix for a RAG chatbot in which questions are asked against a pre-built index of Arize’s documentation. While the same questions are asked in each project, what differs in the model – in this case, GPT-3.4 and GPT-4 — and results in terms of hallucination and QA correctness rates. Learn more about Projects and A/B testing with Phoenix and how to log to a specific project.

Subscribe to our resources and blogs