Save playground outputs as an experiment

After iterating on a template in the playground and observing improvements across a dataset of examples, a typical workflow involves saving the results as an experiment for further analysis and comparison. The saved outputs can then be A/B tested against multiple experiment templates on the same dataset. By comparing outputs side-by-side and aggregating metrics, teams can efficiently collaborate and align on the model best suited for a production workflow, ensuring decisions are based on both qualitative examples and quantitative metrics.

Toggle on the option to "Automatically save experiments for dataset runs".
After successfully creating an experiment, the user selects 'View Experiment' to navigate to the experiments page for reviewing and analyzing the saved results in detail.
The newly created 'experiment appears at the top of the experiments list, displaying its creation time and other associated details for easy reference and comparison.
By clicking into the experiment, the user can review detailed outputs alongside additional metadata, such as example IDs, experiment IDs, and the information needed to reproduce the LLM call, including invocation parameters and the template used.

Last updated

Was this helpful?