07.09.2025: Baseline for Experiment Comparisons π
Available in Phoenix 11.4+
You can now set a baseline run when comparing multiple experiments. This is especially useful when one run represents a known-good output (e.g. a previous model version or a CI-approved run), and you want to evaluate changes relative to it.
For example, in an evaluation like accuracy
, you can easily see where the value flipped from correct β incorrect
or incorrect β correct
between your baseline and the current comparison - helping you quickly spot regressions or improvements.
This feature makes it easier to isolate the impact of changes like a new prompt, model, or dataset.
Previous07.13.2025: Experiments Module in phoenix-client π§ͺNext07.07.2025: Databse Disk Usage Monitor π
Last updated
Was this helpful?