
LLM as a Judge is a general evaluation concept that applies to both evaluation approaches in Phoenix. You can use it via the SDK (client-side) or configure LLM evaluators directly in the Phoenix UI (server-side).
- not grounded in context
- repetitive, repetitive, repetitive
- grammatically incorrect
- excessively lengthy and characterized by an overabundance of words
- incoherent
How It Works
Here’s the step-by-step process for using an LLM as a judge:Identify Evaluation Criteria
First, determine what you want to evaluate, be it faithfulness, toxicity, accuracy, or another characteristic. See our pre-built evaluators for examples of what can be assessed.
Craft Your Evaluation Prompt
Write a prompt template that will guide the evaluation. This template should clearly define what variables are needed from both the initial prompt and the LLM’s response to effectively assess the output.
Select an Evaluation LLM
Choose the most suitable LLM from our available options for conducting your specific evaluations.
Using LLM as a Judge in Phoenix
SDK Evaluations
Write custom LLM evaluators in Python or TypeScript. See also: Configuring the LLM for model selection and prompt setup.
Server-Side Evaluators
Configure LLM evaluators in the Phoenix UI — no local code or API key setup required.

