MLflow

Using Phoenix Evaluators in MLflow

MLflow includes built-in support for Phoenix evaluators through its third-party scorer interface. This allows Phoenix users to run their existing evaluation metrics within MLflow’s mlflow.genai.evaluate() pipeline alongside experiment tracking and model management.

Using Phoenix Evaluators in MLflow

Phoenix evaluators such as Hallucination, QACorrectness, and Toxicity can be used directly as MLflow scorers:

from mlflow.genai.scorers.phoenix import Hallucination, QACorrectness

import mlflow

results = mlflow.genai.evaluate(
    data=eval_dataset,
    scorers=[
        Hallucination(model="openai:/gpt-4o"),
        QACorrectness(model="openai:/gpt-4o"),
    ],
)

For details on available scorers and configuration, see the MLflow Phoenix integration docs.

Cleanlab Ragas

⌘I

Developer Tools

LLM Providers

TypeScript

Python

Java

Platform

Evaluation Integrations

Vector Databases

Using Phoenix Evaluators in MLflow

Developer Tools

LLM Providers

TypeScript

Python

Java

Platform

Evaluation Integrations

Vector Databases

​Using Phoenix Evaluators in MLflow

Using Phoenix Evaluators in MLflow