How do I resolve Phoenix Evals showing NOT PARSABLE?

Here’s how to fix it:

Increase max_tokens: Update the model configuration as follows:

from phoenix.evals import LLM
from phoenix.evals import ClassificationEvaluator

llm = LLM(
    provider="openai",
    model="gpt-4o-2024-08-06",
    api_key=getpass("Enter your OpenAI API key..."),
)
# Pass max_tokens and temperature when creating the evaluator
evaluator = ClassificationEvaluator(
    ...,
    llm=llm,
    temperature=0.2,
    max_tokens=1000,  # Increase token limit
)

Update Phoenix: Use version ≥0.17.4, which removes token limits for OpenAI and increases defaults for other APIs.

Check Logs: Look for finish_reason="length" to confirm token limits caused the issue.

If the above doesn’t work, it’s possible the llm-as-a-judge output might not fit into the defined choices for that particular custom Phoenix eval. Double check the prompt output matches the expected choices.

Can I use gRPC for trace collection?Braintrust Open Source Alternative? LLM Evaluation Platform Comparison

⌘I

Documentation Index