
Run in Google Colab
What you’ll build
A trip-planner prompt that takes a destination, duration, and travel style and produces a day-by-day itinerary. You’ll:- Create a Prompt Object and save it to Prompt Hub.
- Test the prompt against a dataset with an evaluator, as an Arize experiment.
- Iterate — tighten the prompt, save a new version, and compare runs side by side.
- A versioned trip-planner prompt in your Prompt Hub.
- Two experiment runs against the same dataset.
- A measurable improvement between v1 and v2 — in the reference run, mean score went from 0.86 to 1.00.
What you’ll need
- An Arize account with an API key and space ID.
- An OpenAI API key (the notebook uses
gpt-4o-mini; total cost is a few cents). - Python 3.10+ with
arize>=8.0.0andopenaiinstalled (the notebook handles this).
What the notebook covers
| Section | What it demonstrates | Concept page |
|---|---|---|
| Setup | Install dependencies; configure API keys; initialize the Arize client | — |
| 1. Create the prompt | Define a Prompt Object’s template, model, and invocation parameters; save the first version with a commit message | The Prompt Object |
| 2. Test the prompt | Build a small dataset, define a task that runs the prompt, attach a deterministic evaluator, run the experiment | Experiments for prompts |
| 3. Iterate and compare | Tighten the system prompt, save a new immutable version, tag as production, run a second experiment, compare scores per row | Versioning and tags |
Where to go next
After running the notebook:- Open the prompt in Prompt Hub in the UI to see both versions side by side and diff the templates.
- Open the experiments tab to compare the two runs row by row.
- Add an LLM-as-a-judge evaluator for subjective dimensions the deterministic eval can’t catch. See Evaluators.
- Wire the experiment into CI/CD so prompt edits become PR checks. See Prompts in CI/CD.
- Try Prompt Learning for automated optimization once you have a golden dataset. See Optimizing prompts.