Skip to main content
MLflow includes built-in support for Phoenix evaluators through its third-party scorer interface. This allows Phoenix users to run their existing evaluation metrics within MLflow’s mlflow.genai.evaluate() pipeline alongside experiment tracking and model management.

Using Phoenix Evaluators in MLflow

Phoenix evaluators such as Hallucination, QACorrectness, and Toxicity can be used directly as MLflow scorers:
from mlflow.genai.scorers.phoenix import Hallucination, QACorrectness

import mlflow

results = mlflow.genai.evaluate(
    data=eval_dataset,
    scorers=[
        Hallucination(model="openai:/gpt-4o"),
        QACorrectness(model="openai:/gpt-4o"),
    ],
)
For details on available scorers and configuration, see the MLflow Phoenix integration docs.