Optimize Prompts Automatically

Automatically Optimize Prompts with Prompt Learning

We're able to manually bring the accuracy of our prompts up, by looking at one of our error types. But what about the other 6? That's going to take a lot of manual edits + trial and error to improve our prompt. It's time consuming to manually look at all our data and build new prompt versions. You can imagine that with real world agents, that have seen thousands of queries, manually analyzing thousands of data points is not practical.

What if there was a way we could do this automatically? Some algorithm that could look at all the data we've generated, and train a prompt based on it?

Follow along with code: This guide has a companion notebook with runnable code examples. Find it here, and go to Part 4: Optimize Prompts Automatically.

What is Prompt Learning?

Prompt learning is an iterative approach to optimizing LLM prompts by using feedback from evaluations to systematically improve prompt performance. Instead of manually tweaking prompts through trial and error, the SDK automates this process.

The prompt learning process follows this workflow:

Initial Prompt → Generate Outputs → Evaluate Results → Optimize Prompt → Repeat
  1. Initial Prompt: Start with a baseline prompt that defines your task

  2. Generate Outputs: Use the prompt to generate responses on your dataset

  3. Evaluate Results: Run evaluators to assess output quality

  4. Optimize Prompt: Use feedback to generate an improved prompt

  5. Iterate: Repeat until performance meets your criteria

The SDK uses a meta-prompt approach where an LLM analyzes the original prompt, evaluation feedback, and examples to generate an optimized version that better aligns with your evaluation criteria.

For a more detailed dive into Prompt Learning, check out the following resources:

Install the Prompt Learning SDK

We’re now ready to put this into practice. Using the Prompt Learning SDK, we can take the evaluation data we’ve already collected - all those explanations, error types, and fix suggestions - and feed it back into an optimization loop. Instead of manually writing new instructions or tuning parameters, we’ll let the algorithm analyze our experiment results and generate an improved prompt automatically.

Let’s install the SDK and use it to optimize our support query classifier.

Load Experiment for Training

First, head to experiment we ran for version 4 and copy the experiment ID. Our experiment serves as our training data - we'll use the outputs and evals we generated to train our new prompt version.

Load Unoptimized Prompt

Let's load our unoptimized prompt from Phoenix so that we can funnel it through Prompt Learning.

Optimize Prompt (Version 5)

Now, let's optimize our prompt and push the optimized version back to Phoenix.

Measure New Prompt Version's Performance

Now that we've used Prompt Learning to build a new, optimized Prompt Version, let's see how it actually performs!

Let's run another Phoenix experiment on the support query dataset with our new prompt.

Awesome! Our accuracy jumps to 82%!

Summary

Congratulations! You’ve completed the Phoenix Prompts walkthrough! Across these modules, we’ve gone from identifying weak prompts to automatically optimizing them using real evaluation data.

You’ve learned how to:

  • Identify and edit prompts directly from traces to correct misclassifications.

  • Test prompts at scale across datasets to measure accuracy and uncover systematic failure patterns.

  • Compare prompt versions side by side to see which edits, parameters, or models lead to measurable gains.

  • Automate prompt optimization with Prompt Learning, using English feedback from evaluations to train stronger prompts without manual rewriting.

  • Improve accuracy by 30%!

  • Track every iteration in Phoenix, from dataset creation and experiment runs to versioned prompts -creating a full feedback loop between your data, your LLM, and your application.

By the end, you’ve built a complete system for continuous prompt improvement - turning one-off fixes into a repeatable, data-driven optimization workflow.

Next Steps

If you're interested in more tutorials on Prompts, check out:

Last updated

Was this helpful?