Prompt Learning SDK
Optimize your prompts in code
The Prompt Learning SDK is fully open source. View the SDK here.
What is Prompt Learning?
Prompt learning is an iterative approach to optimizing LLM prompts by using feedback from evaluations to systematically improve prompt performance. Instead of manually tweaking prompts through trial and error, the SDK automates this process.
The prompt learning process follows this workflow:
Initial Prompt → Generate Outputs → Evaluate Results → Optimize Prompt → Repeat

Initial Prompt: Start with a baseline prompt that defines your task
Generate Outputs: Use the prompt to generate responses on your dataset
Evaluate Results: Run evaluators to assess output quality
Optimize Prompt: Use feedback to generate an improved prompt
Iterate: Repeat until performance meets your criteria
The SDK uses a meta-prompt approach where an LLM analyzes the original prompt, evaluation feedback, and examples to generate an optimized version that better aligns with your evaluation criteria.
SDK Components
The Prompt Learning SDK consists of several key components:
Core Classes
PromptLearningOptimizer
The main class that orchestrates the prompt optimization process.
MetaPrompt
Handles the construction of meta-prompts used for optimization.
TiktokenSplitter
Manages token counting and batching for large datasets.
Annotator
Generates additional annotations to guide the optimization process.
Key Features
Automatic batching based on token limits
Template variable detection and preservation
Multiple evaluation methods support
Flexible input formats (strings, message lists, PromptVersion objects)
OpenAI model integration for optimization
Setup
Firstt clone the Prompt Learning repository.
git clone https://github.com/Arize-ai/prompt-learning.git
# Set your OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
Basic Usage
1. Initialize the Optimizer
from optimizer_sdk.prompt_learning_optimizer import PromptLearningOptimizer
optimizer = PromptLearningOptimizer(
prompt="You are a helpful assistant. Answer this question: {question}",
model_choice="gpt-4o",
openai_api_key="your-api-key" # Optional if set in environment
)
2. Prepare Your Dataset
Your dataset should contain:
Input columns: The data your prompt will use (e.g.,
question
)Output column: The LLM's response (e.g.,
answer
)Feedback columns: Evaluation results (e.g.,
correctness
,explanation
)
import pandas as pd
dataset = pd.DataFrame({
"question": ["What is the capital of France?", "What is 2+2?"],
"answer": ["Paris", "4"],
"correctness": ["correct", "correct"],
"explanation": ["Accurate answer", "Correct calculation"]
})
3. Run Evaluators (Optional)
If you don't have pre-existing feedback, you can run evaluators:
from phoenix.evals import OpenAIModel, llm_generate
def evaluate_output(dataset):
"""Custom evaluator function"""
# Your evaluation logic here
return dataset, ["correctness", "explanation"]
# Run evaluators
dataset, feedback_columns = optimizer.run_evaluators(
dataset=dataset,
evaluators=[evaluate_output],
feedback_columns=[]
)
4. Optimize the Prompt
optimized_prompt = optimizer.optimize(
dataset=dataset,
output_column="answer",
feedback_columns=["correctness", "explanation"],
context_size_k=128000 # 128k token context window
)
Advanced Usage
Batch Processing and Context Management
The SDK automatically handles large datasets by splitting them into batches that fit within your specified context window.
context_size_k
Parameter
Purpose: Controls the maximum token limit for each optimization batch
Default: 128,000 tokens
Impact: Larger values allow more examples per batch but may increase memory usage
Recommendation: Start with 128k and adjust based on your model's context window
# For models with smaller context windows
optimized_prompt = optimizer.optimize(
dataset=dataset,
output_column="answer",
feedback_columns=["correctness"],
context_size_k=8000 # 8k token limit
)
# For models with larger context windows
optimized_prompt = optimizer.optimize(
dataset=dataset,
output_column="answer",
feedback_columns=["correctness"],
context_size_k=128000 # 128k token limit
)
Template Variable Handling
The SDK automatically detects and preserves template variables in your prompts:
# Your prompt with template variables
prompt = "You are a {role}. Answer this {question_type}: {question}"
# The SDK will preserve {role}, {question_type}, and {question}
# These variables must be present in your dataset columns
Multiple Evaluation Criteria
You can use multiple evaluators and feedback columns for comprehensive optimization:
# Run multiple evaluators
dataset, feedback_columns = optimizer.run_evaluators(
dataset=dataset,
evaluators=[evaluate_accuracy, evaluate_style, evaluate_completeness],
feedback_columns=[]
)
# Optimize using all feedback
optimized_prompt = optimizer.optimizer(
dataset=dataset,
output_column="answer",
feedback_columns=feedback_columns
)
Custom Annotations
Use the annotator to generate additional guidance for optimization. This allows you to pass in all your outputs and evals into another LLM call for a final, comprehensive eval.
from optimizer_sdk.annotator import Annotator
# Create custom annotation prompts
annotation_prompts = [
"Analyze the style and tone of responses",
"Check for factual accuracy and completeness"
]
# Generate annotations
annotations = optimizer.create_annotation(
prompt=prompt,
template_variables=["question"],
dataset=dataset,
feedback_columns=["correctness"],
annotator_prompts=annotation_prompts,
output_column="answer"
)
# Use annotations in optimization
optimized_prompt = optimizer.optimize(
dataset=dataset,
output_column="answer",
feedback_columns=["correctness"],
annotations=annotations
)
Complete Example
Here's a complete example showing the full workflow:
import pandas as pd
from optimizer_sdk.prompt_learning_optimizer import PromptLearningOptimizer
from phoenix.evals import OpenAIModel, llm_generate
# 1. Initialize optimizer
optimizer = PromptLearningOptimizer(
prompt="You are a math tutor. Solve this problem: {problem}",
model_choice="gpt-4o"
)
# 2. Prepare dataset
dataset = pd.DataFrame({
"problem": ["2 + 2 = ?", "5 * 3 = ?", "10 / 2 = ?"],
"answer": ["4", "15", "5"],
"correctness": ["correct", "correct", "correct"],
"explanation": ["Correct addition", "Correct multiplication", "Correct division"]
})
# 3. Optimize prompt
optimized_prompt = optimizer.optimize(
dataset=dataset,
output_column="answer",
feedback_columns=["correctness", "explanation"],
context_size_k=8000
)
print("Original prompt:", optimizer.prompt)
print("Optimized prompt:", optimized_prompt)
Configuration Options
Model Selection
The SDK supports various OpenAI models:
# Supported models
SUPPORTED_MODELS = [
"o1", # Claude 3.5 Sonnet
"o3", # Claude 3.5 Haiku
"gpt-4o", # GPT-4 Omni
"gpt-4", # GPT-4
"gpt-3.5-turbo", # GPT-3.5 Turbo
"gpt-3.5", # GPT-3.5
]
# Choose based on your needs
optimizer = PromptLearningOptimizer(
prompt="Your prompt here",
model_choice="gpt-4o" # Best for complex optimization
)
Input Format Flexibility
The SDK accepts multiple prompt formats:
# String format
optimizer = PromptLearningOptimizer(
prompt="You are a helpful assistant: {input}",
model_choice="gpt-4o"
)
# Message list format
optimizer = PromptLearningOptimizer(
prompt=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Answer this: {input}"}
],
model_choice="gpt-4o"
)
# PromptVersion format (Phoenix integration)
from phoenix.client.types import PromptVersion
prompt_version = PromptVersion(...)
optimizer = PromptLearningOptimizer(
prompt=prompt_version,
model_choice="gpt-4o"
)
Best Practices
1. Dataset Quality
Ensure your dataset is representative of real-world usage
Include diverse examples that cover edge cases
Balance positive and negative feedback
2. Evaluation Criteria
Define clear, measurable evaluation criteria
Use multiple evaluators for comprehensive feedback
Consider both objective (accuracy) and subjective (style) metrics
3. Context Window Management
Start with smaller context windows for faster iteration
Increase context size for more comprehensive optimization
Monitor token usage to optimize costs
4. Iterative Improvement
Run multiple optimization loops
Monitor performance metrics across iterations
Stop when performance plateaus or meets your criteria
5. Template Variable Preservation
Always verify that template variables are preserved
Test optimized prompts with new data
Ensure backward compatibility
Conclusion
The Prompt Learning SDK provides a powerful, automated approach to optimizing LLM prompts. By leveraging evaluation feedback and meta-prompt optimization, you can systematically improve prompt performance across various use cases.
Key benefits:
Automated optimization reduces manual prompt engineering
Data-driven improvements based on actual performance metrics
Scalable approach for production systems
Flexible integration with existing evaluation frameworks
Start with simple use cases and gradually incorporate more sophisticated evaluation criteria as you become familiar with the SDK's capabilities.
Last updated
Was this helpful?