Skip to main content
Now that you have Phoenix up and running, and sent traces to your first project, the next step you can take is running evaluations of your Python application. Evaluations let you measure and monitor the quality of your application by scoring traces against metrics like accuracy, relevance, or custom checks.
1
Before running evals, make sure Phoenix is running & you have sent traces in your project. For more step by step instructions, check out this Get Started guide & Get Started with Tracing guide.
Log in, create a space, navigate to the settings page in your space, and create your API keys.Set your environment variables.
export PHOENIX_API_KEY = "ADD YOUR PHOENIX API KEY"
export PHOENIX_COLLECTOR_ENDPOINT = "ADD YOUR PHOENIX COLLECTOR ENDPOINT"
You can find your collector endpoint here:
After launching your space, go to settings.

Launch your space, navigate to settings & copy your hostname for your collector endpoint

Your Collector Endpoint is: https://app.phoenix.arize.com/s/ + your space name.
2
You’ll need to install the evals library that’s apart of Phoenix.
pip install -q "arize-phoenix-evals>=2"
pip install -q "arize-phoenix-client"
3
Since, we are running our evaluations on our trace data from our first project, we’ll need to pull that data into our code.
from phoenix.client import Client

px_client = Client()
primary_df = px_client.spans.get_spans_dataframe(project_identifier="my-llm-app")
4
In this example, we will define, create, and run our own evaluator. There’s a number of different evaluators you can run, but this quick start will go through an LLM as a Judge Model.1) Define your LLM Judge ModelWe’ll use OpenAI as our evaluation model for this example, but Phoenix also supports a number of other models.If you haven’t yet defined your OpenAI API Key from the previous step, let’s first add it to our environment.
import os
from getpass import getpass

if not (openai_api_key := os.getenv("OPENAI_API_KEY")):
    openai_api_key = getpass("🔑 Enter your OpenAI API key: ")

os.environ["OPENAI_API_KEY"] = openai_api_key

from phoenix.evals.llm import LLM
llm = LLM(model="gpt-4o", provider="openai")
5
Now that we have defined our evaluator, we’re ready to evaluate our traces.
from phoenix.evals import evaluate_dataframe

results_df = evaluate_dataframe(
    dataframe=primary_df,
    evaluators=[correctness_evaluator]
)
6
You’ll now be able to log your evaluations in your project view.
px_client.log_span_annotations(
    dataframe=results_df,
    annotation_name="QA Correctness",
    annotator_kind="LLM"
)

Learn More: