Build, Test, and Optimize a Trip-Planner Prompt

https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/cookbooks/gc.png

Run in Google Colab

This cookbook is a runnable companion to the Prompts concepts section. You’ll work through three sections that mirror the iteration cycle the concept docs describe.

What you’ll build

A trip-planner prompt that takes a destination, duration, and travel style and produces a day-by-day itinerary. You’ll:

Create a Prompt Object and save it to Prompt Hub.
Test the prompt against a dataset with an evaluator, as an Arize experiment.
Iterate — tighten the prompt, save a new version, and compare runs side by side.

By the end you’ll have:

A versioned trip-planner prompt in your Prompt Hub.
Two experiment runs against the same dataset.
A measurable improvement between v1 and v2 — in the reference run, mean score went from 0.86 to 1.00.

What you’ll need

An Arize account with an API key and space ID.
An OpenAI API key (the notebook uses gpt-4o-mini; total cost is a few cents).
Python 3.10+ with arize>=8.0.0 and openai installed (the notebook handles this).

What the notebook covers

Section	What it demonstrates	Concept page
Setup	Install dependencies; configure API keys; initialize the Arize client	—
1. Create the prompt	Define a Prompt Object’s template, model, and invocation parameters; save the first version with a commit message	The Prompt Object
2. Test the prompt	Build a small dataset, define a task that runs the prompt, attach a deterministic evaluator, run the experiment	Experiments for prompts
3. Iterate and compare	Tighten the system prompt, save a new immutable version, tag as `production`, run a second experiment, compare scores per row	Versioning and tags

Where to go next

After running the notebook:

Open the prompt in Prompt Hub in the UI to see both versions side by side and diff the templates.
Open the experiments tab to compare the two runs row by row.
Add an LLM-as-a-judge evaluator for subjective dimensions the deterministic eval can’t catch. See Evaluators.
Wire the experiment into CI/CD so prompt edits become PR checks. See Prompts in CI/CD.
Try Prompt Learning for automated optimization once you have a golden dataset. See Optimizing prompts.

Source

The notebook source lives in the tutorials repo.

Dual Tracing into Databricks Unity Catalog and Arize AX Prompt Experimentation For Summarization Task

⌘I

Run in Google Colab

​What you’ll build

​What you’ll need

​What the notebook covers

​Where to go next

​Source

What you’ll build

What you’ll need

What the notebook covers

Where to go next

Source