Run an experiment
Load a prompt and dataset
Open Playground, load your prompt, then select a dataset (or replay a production span).
Attach evaluators
Add evaluators so each output is scored automatically.
Compare experiments
Once you have multiple runs on the same dataset, open Compare Experiments to inspect:- Output differences
- Evaluator deltas
- Summary metrics by run
- Regressions vs baseline
Playground views
A Playground View is a named snapshot of your Playground session. It stores:- Prompt messages and tools
- Model and parameter configuration
- Dataset or span context
- Generated results and scores
