Prompt Learning
Hand engineering prompts is brittle—small changes can break behavior and manual iteration doesn’t scale. With Prompt Optimization tasks, you can optimize prompts in a few clicks using human or automated feedback loops and versioned releases. This brings prompt engineering into a reproducible, CI-friendly workflow instead of trial-and-error. Each auto-generated version of the prompt is committed to the Prompt Hub, so you can A/B test different versions and safely promote the winner.
Key Features
Auto-generate the best prompt from your labeled dataset
Promote the best auto-generated prompt in the prompt hub as the production version
Evaluate the auto-generated prompt against the original using a side-by-side comparison on the experiments page
Quick setup
Prompt optimization uses a similar task builder as the one used for Online Evals, so you can get started quickly. Pick the prompt you want to optimize from Prompt Hub. If you don’t have one yet, click Create New Prompt to add it. Then, choose a Training Dataset and set a Batch Size (defaults to 10). Finally, select one or more Feedback Columns that contain evaluation signals, which can be labels from human annotators or LLM-as-a-Judge evaluators, plus optional explanations.



Customize your meta prompt
Before launching, Arize shows you the full meta prompt that will steer optimization. This meta prompt has been tested and optimized internally for a variety of customer applications.
To customize this, you can add optional Advanced Instructions, such as “Use ≤150 tokens” or domain-specific constraints. You can also specify the LLM model and parameters, such as the temperature setting.


Track optimization process
The Task Logs page shows each batch as it runs, including what prompt was used, which examples were evaluated, and what feedback was generated.
After each batch, the optimizer proposes a new prompt candidate, which is automatically saved as a new version in the Prompt Hub.
To review changes, go to the prompt’s page in Prompt Hub and review the list of Versions.


Compare the optimized prompt against the original
From Prompt Hub, you can compare the final optimized prompt to the original by launching an experiment in the Playground.
Select an evaluation dataset, choose the two prompt versions, and review metrics side-by-side, including both high-level summary metrics and example-level outputs.
Coming soon: Prompt Optimization tasks will optionally trigger this experiment automatically.

Deploy with Human-in-the-Loop Oversight
Once you’ve reviewed the results, tag the winning prompt version as Production in Prompt Hub.
The Prompt Hub SDK will automatically pull the latest version with the "production"
tag at inference time, ensuring a human-in-the-loop that enables you to ship with confidence.

Last updated
Was this helpful?