Prompt Playground

Iterate on prompts with curated data from development and production

Prompt Playground helps developers experiment with prompt templates, input variables, LLM models, and parameters. This no-code platform empowers both coding and non-coding experts to refine their prompts for production applications.

Key features

  1. Iterate on your prompts with any model using our Playground Integrations

  2. Replay spans from your production data

  3. Build prompts with AI using ourCopilot: prompt builder

  4. Manage your prompts in one place with Prompt Hub

Find problematic production examples

The most common way to enter the Prompt Playground is through a span on the LLM tracing page. For instance, users can filter spans where an Online Evaluator flagged the LLM output as a hallucination and then bring one of these examples into the Prompt Playground to refine the prompt, ensuring the LLM produces factual responses in the future.

User applies filters to identify examples where the Online Evaluator flagged the LLM output as a hallucination.
User selects the Prompt Playground button to import the template, input variables, and LLM output for iteration in the playground.
User modifies the template and input variables in the Playground to refine and iterate on the prompt.

Iterate on your prompts

You can iterate on prompts by comparing them side by side with different models, tools, LLM parameters, prompt templates, and variables. The first step is to select the "clone prompt" or "+ prompt" button to create a new prompt.

User creates a new prompt for side-by-side comparison.
User sets the temperature to zero to ensure consistent results.
User duplicates Prompt A and then switches the model from gpt-3.5-turbo to gpt-4o for a direct comparison.

Optimize your prompts with Copilot

Another approach to reducing hallucinations is modifying the template. Using Copilot, the user optimizes the prompt, instructing the LLM to respond with 'I don’t know' when the answer is not found in the provided context. After pressing 'Run' with the updated prompt template, the New Output confirms that the LLM now responds with 'I don’t know' instead of generating a fabricated answer.

Optimize the prompt with Copilot

Load a dataset

While we have observed improved performance on a single example, how can we ensure consistent improvement across many examples? To validate that a new prompt effectively reduces hallucinations more broadly, we can load a dataset of hallucinated examples into the Prompt Playground and test the updated prompt against the entire dataset.

Load in a dataset of hallucinated examples.

Save playground run as experiment

If you'd like to share playground runs with your team, you can save them as an experiment to showcase the results in a shareable link.

Select playground settings.
Turn on auto save.
Select 'View Experiment' to navigate to the experiments page
Review playground outputs

Save prompt template to prompt hub

The template can also be saved to the Prompt Hub, making it especially valuable for production use cases and collaboration.

By clicking on a specific prompt, the user can view its metadata, version history, and the associated prompt template and LLM parameters for the selected version.

Learn more

Last updated

Was this helpful?