Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Guides on how to do prompt engineering with Phoenix
Configure AI Providers - how to configure API keys for OpenAI, Anthropic, Gemini, and more.
Organize and manage prompts with Phoenix to streamline your development workflow
Create a prompt - how to create, update, and track prompt changes
Test a prompt - how to test changes to a prompt in the playground and in the notebook
Tag a prompt - how to mark certain prompt versions as ready for
Using a prompt - how to integrate prompts into your code and experiments
Iterate on prompts and models in the prompt playground
Using the Playground - how to setup the playground and how to test prompt changes via datasets and experiments.
Phoenix natively integrates with OpenAI, Azure OpenAI, Anthropic, and Google AI Studio (gemini) to make it easy to test changes to your prompts. In addition to the above, since many AI providers (deepseek, ollama) can be used directly with the OpenAI client, you can talk to any OpenAI compatible LLM provider.
To securely provide your API keys, you have two options. One is to store them in your browser in local storage. Alternatively, you can set them as environment variables on the server side. If both are set at the same time, the credential set in the browser will take precedence.
API keys can be entered in the playground application via the API Keys dropdown menu. This option stores API keys in the browser. Simply navigate to to settings and set your API keys.
Available on self-hosted Phoenix
If the following variables are set in the server environment, they'll be used at API invocation time.
Since you can configure the base URL for the OpenAI client, you can use the prompt playground with a variety of OpenAI Client compatible LLMs such as Ollama, DeepSeek, and more.
OpenAI Client compatible providers Include
Optionally, the server can be configured with the OPENAI_BASE_URL
environment variable to change target any OpenAI compatible REST API.
For app.phoenix.arize.com, this may fail due to security reasons. In that case, you'd see a Connection Error appear.
If there is a LLM endpoint you would like to use, reach out to
OpenAI
OPENAI_API_KEY
Azure OpenAI
AZURE_OPENAI_API_KEY
AZURE_OPENAI_ENDPOINT
OPENAI_API_VERSION
Anthropic
ANTHROPIC_API_KEY
Gemini
GEMINI_API_KEY or GOOGLE_API_KEY
DeepSeek
Ollama
General guidelines on how to use Phoenix's prompt playground
To first get started, you will first Configure AI Providers. In the playground view, create a valid prompt for the LLM and click Run on the top right (or the mod + enter
)
If successful you should see the LLM output stream out in the Output section of the UI.
The prompt editor (typically on the left side of the screen) is where you define the . You select the template language (mustache or f-string) on the toolbar. Whenever you type a variable placeholder in the prompt (say {{question}} for mustache), the variable to fill will show up in the inputs section. Input variables must either be filled in by hand or can be filled in via a dataset (where each row has key / value pairs for the input).
Every prompt instance can be configured to use a specific LLM and set of invocation parameters. Click on the model configuration button at the top of the prompt editor and configure your LLM of choice. Click on the "save as default" option to make your configuration sticky across playground sessions.
The Prompt Playground offers the capability to compare multiple prompt variants directly within the playground. Simply click the + Compare button at the top of the first prompt to create duplicate instances. Each prompt variant manages its own independent template, model, and parameters. This allows you to quickly compare prompts (labeled A, B, C, and D in the UI) and run experiments to determine which prompt and model configuration is optimal for the given task.
Phoenix lets you run a prompt (or multiple prompts) on a dataset. Simply load a dataset containing the input variables you want to use in your prompt template. When you click Run, Phoenix will apply each configured prompt to every example in the dataset, invoking the LLM for all possible prompt-example combinations. The result of your playground runs will be tracked as an experiment under the loaded dataset (see Playground Traces)
All invocations of an LLM via the playground is recorded for analysis, annotations, evaluations, and dataset curation.
If you simply run an LLM in the playground using the free form inputs (e.g. not using a dataset), Your spans will be recorded in a project aptly titled "playground".
If however you run a prompt over dataset examples, the outputs and spans from your playground runs will be captured as an experiment. Each experiment will be named according to the prompt you ran the experiment over.
How to deploy prompts to different environments safely
Prompts in Phoenix are versioned in a linear history, creating a comprehensive audit trail of all modifications. Each change is tracked, allowing you to:
Review the complete history of a prompt
Understand who made specific changes
Revert to previous versions if needed
When you are ready to deploy a prompt to a certain environment (let's say staging), the best thing to do is to tag a specific version of your prompt as ready. By default Phoenix offers 3 tags, production, staging, and development but you can create your own tags as well.
Each tag can include an optional description to provide additional context about its purpose or significance. Tags are unique per prompt, meaning you cannot have two tags with the same name for the same prompt.
It can be helpful to have custom tags to track different versions of a prompt. For example if you wanted to tag a certain prompt as the one that was used in your v0 release, you can create a custom tag with that name to keep track!
When creating a custom tag, you can provide:
A name for the tag (must be a valid identifier)
An optional description to provide context about the tag's purpose
Once a prompt version is tagged, you can pull this version of the prompt into any environment that you would like (an application, an experiment). Similar to git tags, prompt version tags let you create a "release" of a prompt (e.x. pushing a prompt to staging).
You can retrieve a prompt version by:
Using the tag name directly (e.g., "production", "staging", "development")
Using a custom tag name
Using the latest version (which will return the most recent version regardless of tags)
For full details on how to use prompts in code, see Using a prompt
You can list all tags associated with a specific prompt version. The list is paginated, allowing you to efficiently browse through large numbers of tags. Each tag in the list includes:
The tag's unique identifier
The tag's name
The tag's description (if provided)
This is particularly useful when you need to:
Review all tags associated with a prompt version
Verify which version is currently tagged for a specific environment
Track the history of tag changes for a prompt version
Tag names must be valid identifiers: lowercase letters, numbers, hyphens, and underscores, starting and ending with a letter or number.
Examples: staging
, production-v1
, release-2024
from phoenix.client import Client
# Create a tag for a prompt version
Client().prompts.tags.create(
prompt_version_id="version-123",
name="production",
description="Ready for production environment"
)
# List tags for a prompt version
tags = Client().prompts.tags.list(prompt_version_id="version-123")
for tag in tags:
print(f"Tag: {tag.name}, Description: {tag.description}")
# Get a prompt version by tag
prompt_version = Client().prompts.get(
prompt_identifier="my-prompt",
tag="production"
)
from phoenix.client import AsyncClient
# Create a tag for a prompt version
await AsyncClient().prompts.tags.create(
prompt_version_id="version-123",
name="production",
description="Ready for production environment"
)
# List tags for a prompt version
tags = await AsyncClient().prompts.tags.list(prompt_version_id="version-123")
for tag in tags:
print(f"Tag: {tag.name}, Description: {tag.description}")
# Get a prompt version by tag
prompt_version = await AsyncClient().prompts.get(
prompt_identifier="my-prompt",
tag="production"
)
Testing your prompts before you ship them is vital to deploying reliable AI applications
The Playground is a fast and efficient way to refine prompt variations. You can load previous prompts and validate their performance by applying different variables.
Each single-run test in the Playground is recorded as a span in the Playground project, allowing you to revisit and analyze LLM invocations later. These spans can be added to datasets or reloaded for further testing.
The ideal way to test a prompt is to construct a golden dataset where the dataset examples contains the variables to be applied to the prompt in the inputs and the outputs contains the ideal answer you want from the LLM. This way you can run a given prompt over N number of examples all at once and compare the synthesized answers against the golden answers.
Playground integrates with datasets and experiments to help you iterate and incrementally improve your prompts. Experiment runs are automatically recorded and available for subsequent evaluation to help you understand how changes to your prompts, LLM model, or invocation parameters affect performance.
Prompt Playground supports side-by-side comparisons of multiple prompt variants. Click + Compare to add a new variant. Whether using Span Replay or testing prompts over a Dataset, the Playground processes inputs through each variant and displays the results for easy comparison.
Sometimes you may want to test a prompt and run evaluations on a given prompt. This can be particularly useful when custom manipulation is needed (e.x. you are trying to iterate on a system prompt on a variety of different chat messages). 🚧 This tutorial is coming soon
Once you have tagged a version of a prompt as ready (e.x. "staging") you can pull a prompt into your code base and use it to prompt an LLM.
To use prompts in your code you will need to install the phoenix client library.
For Python:
For JavaScript / TypeScript:
There are three major ways pull prompts, pull by (latest), pull by version, and pull by tag.
Pulling a prompt by name or ID (e.g. the identifier) is the easiest way to pull a prompt. Note that since name and ID doesn't specify a specific version, you will always get the latest version of a prompt. For this reason we only recommend doing this during development.
Note prompt names and IDs are synonymous.
Pulling a prompt by version retrieves the content of a prompt at a particular point in time. The version can never change, nor be deleted, so you can reasonably rely on it in production-like use cases.
Pulling by prompt by is most useful when you want a particular version of a prompt to be automatically used in a specific environment (say "staging"). To pull prompts by tag, you must in the UI first.
Note that tags are unique per prompt so it must be paired with the prompt_identifier
A Prompt pulled in this way can be automatically updated in your application by simply moving the "staging" tag from one prompt version to another.
The phoenix clients support formatting the prompt with variables, and providing the messages, model information, , and response format (when applicable).
The Phoenix Client libraries make it simple to transform prompts to the SDK that you are using (no proxying necessary!)
Both the Python and TypeScript SDKs support transforming your prompts to a variety of SDKs (no proprietary SDK necessary).
Python - support for OpenAI, Anthropic, Gemini
TypeScript - support for OpenAI, Anthropic, and the Vercel AI SDK
pip install arize-phoenix-client
npm install @arizeai/phoenix-client
# Initialize a phoenix client with your phoenix endpoint
# By default it will read from your environment variables
client = Client(
# endpoint="https://my-phoenix.com",
)
# The version ID can be found in the versions tab in the UI
prompt = client.prompts.get(prompt_version_id="UHJvbXB0VmVyc2lvbjoy")
print(prompt.id)
prompt.dumps()
import { getPrompt } from "@arizeai/phoenix-client/prompts";
const promptByVersionId = await getPrompt({ versionId: "b5678" })
// ^ the latest version of the prompt with Id "a1234"
# By default it will read from your environment variables
client = Client(
# endpoint="https://my-phoenix.com",
)
# Since tags don't uniquely identify a prompt version
# it must be paired with the prompt identifier (e.g. name)
prompt = client.prompts.get(prompt_identifier="my-prompt-name", tag="staging")
print(prompt.id)
prompt.dumps()
import { getPrompt } from "@arizeai/phoenix-client/prompts";
const promptByTag = await getPrompt({ tag: "staging", name: "my-prompt" });
// ^ the specific prompt version tagged "production", for prompt "my-prompt"
from openai import OpenAI
prompt_vars = {"topic": "Sports", "article": "Surrey have signed Australia all-rounder Moises Henriques for this summer's NatWest T20 Blast. Henriques will join Surrey immediately after the Indian Premier League season concludes at the end of next month and will be with them throughout their Blast campaign and also as overseas cover for Kumar Sangakkara - depending on the veteran Sri Lanka batsman's Test commitments in the second half of the summer. Australian all-rounder Moises Henriques has signed a deal to play in the T20 Blast for Surrey . Henriques, pictured in the Big Bash (left) and in ODI action for Australia (right), will join after the IPL . Twenty-eight-year-old Henriques, capped by his country in all formats but not selected for the forthcoming Ashes, said: 'I'm really looking forward to playing for Surrey this season. It's a club with a proud history and an exciting squad, and I hope to play my part in achieving success this summer. 'I've seen some of the names that are coming to England to be involved in the NatWest T20 Blast this summer, so am looking forward to testing myself against some of the best players in the world.' Surrey director of cricket Alec Stewart added: 'Moises is a fine all-round cricketer and will add great depth to our squad.'"}
formatted_prompt = prompt.format(variables=prompt_vars)
# Make a request with your Prompt
oai_client = OpenAI()
resp = oai_client.chat.completions.create(**formatted_prompt)
import { getPrompt, toSDK } from "@arizeai/phoenix-client/prompts";
import OpenAI from "openai";
const openai = new OpenAI()
const prompt = await getPrompt({ name: "my-prompt" });
// openaiParameters is fully typed, and safe to use directly in the openai client
const openaiParameters = toSDK({
// sdk does not have to match the provider saved in your prompt
// if it differs, we will apply a best effort conversion between providers automatically
sdk: "openai",
prompt: questionAskerPrompt,
// variables within your prompt template can be replaced across messages
variables: { question: "How do I write 'Hello World' in JavaScript?" }
});
const response = await openai.chat.completions.create({
...openaiParameters,
// you can still override any of the invocation parameters as needed
// for example, you can change the model or stream the response
model: "gpt-4o-mini",
stream: false
})
Store and track prompt versions in Phoenix
Prompts with Phoenix can be created using the playground as well as via the phoenix-clients.
Navigate to the Prompts in the navigation and click the add prompt button on the top right. This will navigate you to the Playground.
The playground is like the IDE where you will develop your prompt. The prompt section on the right lets you add more messages, change the template format (f-string or mustache), and an output schema (JSON mode).
To the right you can enter sample inputs for your prompt variables and run your prompt against a model. Make sure that you have an API key set for the LLM provider of your choosing.
To save the prompt, click the save button in the header of the prompt on the right. Name the prompt using alpha numeric characters (e.x. `my-first-prompt`) with no spaces. The model configuration you selected in the Playground will be saved with the prompt. When you re-open the prompt, the model and configuration will be loaded along with the prompt.
You just created your first prompt in Phoenix! You can view and search for prompts by navigating to Prompts in the UI.
Prompts can be loaded back into the Playground at any time by clicking on "open in playground"
To view the details of a prompt, click on the prompt name. You will be taken to the prompt details view. The prompt details view shows all the that has been saved (ex: the model used, the invocation parameters, etc.)
Once you've crated a prompt, you probably need to make tweaks over time. The best way to make tweaks to a prompt is using the playground. Depending on how destructive a change you are making you might want to just create a new or clone the prompt.
To make edits to a prompt, click on the edit in Playground on the top right of the prompt details view.
When you are happy with your prompt, click save. You will be asked to provide a description of the changes you made to the prompt. This description will show up in the history of the prompt for others to understand what you did.
In some cases, you may need to modify a prompt without altering its original version. To achieve this, you can clone a prompt, similar to forking a repository in Git.
Cloning a prompt allows you to experiment with changes while preserving the history of the main prompt. Once you have made and reviewed your modifications, you can choose to either keep the cloned version as a separate prompt or merge your changes back into the main prompt. To do this, simply load the cloned prompt in the playground and save it as the main prompt.
This approach ensures that your edits are flexible and reversible, preventing unintended modifications to the original prompt.
🚧 Prompt labels and metadata is still under construction.
Starting with prompts, Phoenix has a dedicated client that lets you programmatically. Make sure you have installed the appropriate phoenix-client before proceeding.
Creating a prompt in code can be useful if you want a programatic way to sync prompts with the Phoenix server.
Below is an example prompt for summarizing articles as bullet points. Use the Phoenix client to store the prompt in the Phoenix server. The name of the prompt is an identifier with lowercase alphanumeric characters plus hyphens and underscores (no spaces).
import phoenix as px
from phoenix.client.types import PromptVersion
content = """\
You're an expert educator in {{ topic }}. Summarize the following article
in a few concise bullet points that are easy for beginners to understand.
{{ article }}
"""
prompt_name = "article-bullet-summarizer"
prompt = px.Client().prompts.create(
name=prompt_name,
version=PromptVersion(
[{"role": "user", "content": content}],
model_name="gpt-4o-mini",
),
)
A prompt stored in the database can be retrieved later by its name. By default the latest version is fetched. Specific version ID or a tag can also be used for retrieval of a specific version.
prompt = px.Client().prompts.get(prompt_identifier=prompt_name)
If a version is tagged with, e.g. "production", it can retrieved as follows.
prompt = px.Client().prompts.get(prompt_identifier=prompt_name, tag="production")
Below is an example prompt for summarizing articles as bullet points. Use the Phoenix client to store the prompt in the Phoenix server. The name of the prompt is an identifier with lowercase alphanumeric characters plus hyphens and underscores (no spaces).
import { createPrompt, promptVersion } from "@arizeai/phoenix-client";
const promptTemplate = `
You're an expert educator in {{ topic }}. Summarize the following article
in a few concise bullet points that are easy for beginners to understand.
{{ article }}
`;
const version = createPrompt({
name: "article-bullet-summarizer",
version: promptVersion({
modelProvider: "OPENAI",
modelName: "gpt-3.5-turbo",
template: [
{
role: "user",
content: promptTemplate,
},
],
}),
});
A prompt stored in the database can be retrieved later by its name. By default the latest version is fetched. Specific version ID or a tag can also be used for retrieval of a specific version.
import { getPrompt } from "@arizeai/phoenix-client/prompts";
const prompt = await getPrompt({ name: "article-bullet-summarizer" });
// ^ you now have a strongly-typed prompt object, in the Phoenix SDK Prompt type
If a version is tagged with, e.g. "production", it can retrieved as follows.
const promptByTag = await getPrompt({ tag: "production", name: "article-bullet-summarizer" });
// ^ you can optionally specify a tag to filter by
from phoenix.client import Client
# Initialize a phoenix client with your phoenix endpoint
# By default it will read from your environment variables
client = Client(
# endpoint="https://my-phoenix.com",
)
# Pulling a prompt by name
prompt_name = "my-prompt-name"
client.prompts.get(prompt_identifier=prompt_name)
import { getPrompt } from "@arizeai/phoenix-client/prompts";
const prompt = await getPrompt({ name: "my-prompt" });
// ^ the latest version of the prompt named "my-prompt"
const promptById = await getPrompt({ promptId: "a1234" })
// ^ the latest version of the prompt with Id "a1234"