Prompt Playground

Iterate on prompts with curated data from development and production

Prompt Playground helps developers experiment with prompt templates, input variables, LLM models, and parameters. This no-code platform empowers both coding and non-coding experts to refine their prompts for production applications.

Key features

Iterate on your prompts with any model using our Playground Integrations
Compare prompts side-by-side, Use tool calling and Use image inputs
Replay spans from your production data
Load a dataset into playground andSave playground outputs as an experiment
Build prompts with AI using ourCopilot: prompt builder
Manage your prompts in one place with Prompt Hub

Find problematic production examples

The most common way to enter the Prompt Playground is through a span on the LLM tracing page. For instance, users can filter spans where an Online Evaluator flagged the LLM output as a hallucination and then bring one of these examples into the Prompt Playground to refine the prompt, ensuring the LLM produces factual responses in the future.

Iterate on your prompts

You can iterate on prompts by comparing them side by side with different models, tools, LLM parameters, prompt templates, and variables. The first step is to select the "clone prompt" or "+ prompt" button to create a new prompt.

Optimize your prompts with Copilot

Another approach to reducing hallucinations is modifying the template. Using Copilot, the user optimizes the prompt, instructing the LLM to respond with 'I don’t know' when the answer is not found in the provided context. After pressing 'Run' with the updated prompt template, the New Output confirms that the LLM now responds with 'I don’t know' instead of generating a fabricated answer.

Load a dataset

While we have observed improved performance on a single example, how can we ensure consistent improvement across many examples? To validate that a new prompt effectively reduces hallucinations more broadly, we can load a dataset of hallucinated examples into the Prompt Playground and test the updated prompt against the entire dataset.