Optimize Prompts Automatically
Automatically Optimize Prompts with Prompt Learning
We're able to manually bring the accuracy of our prompts up, by looking at one of our error types. But what about the other 6? That's going to take a lot of manual edits + trial and error to improve our prompt. It's time consuming to manually look at all our data and build new prompt versions. You can imagine that with real world agents, that have seen thousands of queries, manually analyzing thousands of data points is not practical.
What if there was a way we could do this automatically? Some algorithm that could look at all the data we've generated, and train a prompt based on it?
What is Prompt Learning?
Prompt learning is an iterative approach to optimizing LLM prompts by using feedback from evaluations to systematically improve prompt performance. Instead of manually tweaking prompts through trial and error, the SDK automates this process.
The prompt learning process follows this workflow:
Initial Prompt → Generate Outputs → Evaluate Results → Optimize Prompt → RepeatInitial Prompt: Start with a baseline prompt that defines your task
Generate Outputs: Use the prompt to generate responses on your dataset
Evaluate Results: Run evaluators to assess output quality
Optimize Prompt: Use feedback to generate an improved prompt
Iterate: Repeat until performance meets your criteria
The SDK uses a meta-prompt approach where an LLM analyzes the original prompt, evaluation feedback, and examples to generate an optimized version that better aligns with your evaluation criteria.
For a more detailed dive into Prompt Learning, check out the following resources:
Install the Prompt Learning SDK
We’re now ready to put this into practice. Using the Prompt Learning SDK, we can take the evaluation data we’ve already collected - all those explanations, error types, and fix suggestions - and feed it back into an optimization loop. Instead of manually writing new instructions or tuning parameters, we’ll let the algorithm analyze our experiment results and generate an improved prompt automatically.
Let’s install the SDK and use it to optimize our support query classifier.
Load Experiment for Training
First, head to experiment we ran for version 4 and copy the experiment ID. Our experiment serves as our training data - we'll use the outputs and evals we generated to train our new prompt version.
Load Unoptimized Prompt
Let's load our unoptimized prompt from Phoenix so that we can funnel it through Prompt Learning.
Optimize Prompt (Version 5)
Now, let's optimize our prompt and push the optimized version back to Phoenix.
Measure New Prompt Version's Performance
Now that we've used Prompt Learning to build a new, optimized Prompt Version, let's see how it actually performs!
Let's run another Phoenix experiment on the support query dataset with our new prompt.
Awesome! Our accuracy jumps to 82%!
Summary
Congratulations! You’ve completed the Phoenix Prompts walkthrough! Across these modules, we’ve gone from identifying weak prompts to automatically optimizing them using real evaluation data.
You’ve learned how to:
Identify and edit prompts directly from traces to correct misclassifications.
Test prompts at scale across datasets to measure accuracy and uncover systematic failure patterns.
Compare prompt versions side by side to see which edits, parameters, or models lead to measurable gains.
Automate prompt optimization with Prompt Learning, using English feedback from evaluations to train stronger prompts without manual rewriting.
Improve accuracy by 30%!
Track every iteration in Phoenix, from dataset creation and experiment runs to versioned prompts -creating a full feedback loop between your data, your LLM, and your application.
By the end, you’ve built a complete system for continuous prompt improvement - turning one-off fixes into a repeatable, data-driven optimization workflow.
Next Steps
If you're interested in more tutorials on Prompts, check out:
Last updated
Was this helpful?