07.09.2025: Baseline for Experiment Comparisons 🔁

Available in Phoenix 11.4+

You can now set a baseline run when comparing multiple experiments. This is especially useful when one run represents a known-good output (e.g. a previous model version or a CI-approved run), and you want to evaluate changes relative to it.

For example, in an evaluation like accuracy, you can easily see where the value flipped from correct → incorrect or incorrect → correct between your baseline and the current comparison - helping you quickly spot regressions or improvements.

This feature makes it easier to isolate the impact of changes like a new prompt, model, or dataset.

feat(experiments): add baseline to compare experiments page by axiomofjoy · Pull Request #8461 · Arize-ai/phoenixGitHub

Previous07.13.2025: Experiments Module in phoenix-client 🧪Next07.07.2025: Databse Disk Usage Monitor 🛑

Last updated 2 months ago

Was this helpful?