Load a dataset into playground

Many users curate datasets for evaluating their prompts in their playground, which often cover the following use cases:

'Golden datasets' of core examples where it is important to avoid a regression — for example, critical user queries or high-impact business scenarios.
Challenge datasets containing hard examples where they would like to hill climb on performance — for example, a dataset of jailbreak prompts or examples of past hallucinations.

When modifying a prompt in the playground, you can test your new prompt across a dataset of examples to validate that the model is hill climbing in terms of performance across challenging examples, without regressing on core business use cases.

Last updated 4 months ago

Was this helpful?