Skip to main content
https://storage.googleapis.com/arize-phoenix-assets/assets/images/arize-docs-images/cookbooks/gc.png

Run in Google Colab

This cookbook is a runnable companion to the Prompts concepts section. You’ll work through three sections that mirror the iteration cycle the concept docs describe.

What you’ll build

A trip-planner prompt that takes a destination, duration, and travel style and produces a day-by-day itinerary. You’ll:
  1. Create a Prompt Object and save it to Prompt Hub.
  2. Test the prompt against a dataset with an evaluator, as an Arize experiment.
  3. Iterate — tighten the prompt, save a new version, and compare runs side by side.
By the end you’ll have:
  • A versioned trip-planner prompt in your Prompt Hub.
  • Two experiment runs against the same dataset.
  • A measurable improvement between v1 and v2 — in the reference run, mean score went from 0.86 to 1.00.

What you’ll need

  • An Arize account with an API key and space ID.
  • An OpenAI API key (the notebook uses gpt-4o-mini; total cost is a few cents).
  • Python 3.10+ with arize>=8.0.0 and openai installed (the notebook handles this).

What the notebook covers

SectionWhat it demonstratesConcept page
SetupInstall dependencies; configure API keys; initialize the Arize client
1. Create the promptDefine a Prompt Object’s template, model, and invocation parameters; save the first version with a commit messageThe Prompt Object
2. Test the promptBuild a small dataset, define a task that runs the prompt, attach a deterministic evaluator, run the experimentExperiments for prompts
3. Iterate and compareTighten the system prompt, save a new immutable version, tag as production, run a second experiment, compare scores per rowVersioning and tags

Where to go next

After running the notebook:
  • Open the prompt in Prompt Hub in the UI to see both versions side by side and diff the templates.
  • Open the experiments tab to compare the two runs row by row.
  • Add an LLM-as-a-judge evaluator for subjective dimensions the deterministic eval can’t catch. See Evaluators.
  • Wire the experiment into CI/CD so prompt edits become PR checks. See Prompts in CI/CD.
  • Try Prompt Learning for automated optimization once you have a golden dataset. See Optimizing prompts.

Source

The notebook source lives in the tutorials repo.