1 of 1

07.09.2025: Baseline for Experiment Comparisons 🔁

Available in Phoenix 11.4+

You can now set a baseline run when comparing multiple experiments. This is especially useful when one run represents a known-good output (e.g. a previous model version or a CI-approved run), and you want to evaluate changes relative to it.

For example, in an evaluation like accuracy, you can easily see where the value flipped from correct → incorrect or incorrect → correct between your baseline and the current comparison - helping you quickly spot regressions or improvements.

This feature makes it easier to isolate the impact of changes like a new prompt, model, or dataset.

07.09.2025: Baseline for Experiment Comparisons 🔁

Available in Phoenix 11.4+

This feature makes it easier to isolate the impact of changes like a new prompt, model, or dataset.