Skip to main content
The first half of this section was about making prompt iteration fast and safe. This page is about a different question: when iteration is fast and safe, what’s the best way to actually do the iteration? Three paths. They aren’t mutually exclusive — most teams use all three at different points.

The three paths

Three columns labeled manual, Alyx, and Prompt Learning, each with rows describing the mechanism and when to reach for it
PathMechanismWhen to reach for it
Manual in the PlaygroundHuman edits, side-by-side compare, eyeball outputs + evaluator scoresSmall tweaks, intuition checks, when you have a specific hypothesis to test
Conversational with AlyxTell Alyx the goal, refine via conversationGoal-driven optimization without writing code or hand-editing
Prompt Learning (automated)SDK or UI task that uses meta-prompting + evaluator feedback to generate ranked prompt variantsHands-off, larger-scale optimization against a golden dataset

Path 1 — Manual in the Playground

The Playground is the workshop. You load a prompt, run it against a dataset, see scores, edit, run again. With side-by-side comparison, you can run up to three variants in parallel and see the deltas immediately. When this is the right path:
  • Small tweaks — adjusting a sentence in the system message, tightening a constraint.
  • Specific hypotheses — “I think temperature 0.3 will be better than 0.7 for this task; let me check.”
  • Visual inspection — you want to look at individual outputs and form a judgment, not just look at aggregate scores.
  • Exploratory iteration — you don’t know yet what would help; you’re trying things to learn what’s possible.
For most prompt work in the early stages, manual iteration in the Playground is the right shape. It’s fast and the feedback loop is immediate.

Path 2 — Conversational with Alyx

Alyx is the AI agent built into Arize AX. For prompt optimization, the workflow is:
  • “Optimize my prompt to reduce verbosity while keeping the escalation rules.”
  • “Make this prompt produce a single tool name with no extra text.”
  • “Tighten the prompt against this golden dataset — the scores are too low.”
Alyx loads the prompt, the dataset, the evaluator(s), and iterates. The conversation thread holds context across iterations, so refinement happens incrementally — “that’s almost right, but lose the closing pleasantry” — without starting over. When this is the right path:
  • You can name the goal. If you can describe what “better” means in a sentence, Alyx can usually iterate toward it.
  • You don’t want to write code. Conversational interfaces are faster than code interfaces for goal-driven changes.
  • You want refinement, not just one shot. The conversation thread holds context, so each iteration sharpens the previous one.
When to reach for something else: when you want a quantitative comparison across many variants, or a fully-automated optimization loop with no human-in-the-loop.

Path 3 — Prompt Learning (automated)

Prompt Learning is the most hands-off path. It’s an automated loop:
  1. You point it at a starting prompt and a dataset that includes evaluator feedback (labels and explanations) on prior outputs.
  2. It uses meta-prompting — an LLM analyzes the prompt + the feedback + the data — to propose a revised prompt.
  3. It runs the revised prompt against the dataset and scores it.
  4. It repeats, ranking variants by score, converging on the best one.
Every iteration is committed back to the Prompt Hub as a new version, so the optimization history is auditable and the winner is deployable like any other version. When this is the right path:
  • Larger-scale optimization. When manual iteration would take too long because the search space is big.
  • A golden dataset already exists. Prompt Learning needs labeled examples (or evaluator feedback) as its training signal.
  • You want it to run unattended. Fire-and-forget loops are exactly Prompt Learning’s shape.
  • Multiple competing objectives. Multi-evaluator feedback can guide the optimization across several criteria at once.
The mechanics — meta-prompting, ranking, how the feedback loop actually works — are deep enough to deserve their own conceptual treatment. That’s planned for a future Prompt Optimization concepts section. For now, see the how-to pages: Optimize a prompt, and the cookbook walkthroughs under Prompt Learning.

Picking a path

A rough decision tree:
  • You have a specific small change in mind → manual in the Playground.
  • You can describe the goal in a sentence → Alyx.
  • You have a golden dataset and want unattended optimization → Prompt Learning.
All three paths feed the same iteration cycle. All three produce new Prompt Object versions saved to the Hub. All three benefit from a CI bar that catches the resulting variant before it ships. They’re three knobs on the same machine — pick the one that fits the moment.

You’ve reached the end of this section

You now have the conceptual on-ramp for prompts in Arize AX:
  • Why prompts are engineering artifacts, and the iteration cycle they live inside.
  • The five parts of a Prompt Object — template, model, invocation parameters, tools, response format.
  • How versioning and tags work, and what that buys you.
  • The Hub as the centralized repository, the Playground as the iteration surface.
  • How prompts load into your application via the SDK + local cache.
  • Datasets, ground truth, and labeling queues.
  • Experiments as the unit of comparison; CI/CD as the way to enforce a bar.
  • Three optimization paths and when each one fits.
Time to build:

Build a prompt

Trip planner cookbook