Save playground outputs as an experiment

After iterating on a template in the playground and observing improvements across a dataset of examples, a typical workflow involves saving the results as an experiment for further analysis and comparison. The saved outputs can then be A/B tested against multiple experiment templates on the same dataset. By comparing outputs side-by-side and aggregating metrics, teams can efficiently collaborate and align on the model best suited for a production workflow, ensuring decisions are based on both qualitative examples and quantitative metrics.

Last updated 1 month ago

Was this helpful?