Phoenix
CtrlK
TypeScript APIPython APICommunityGitHubPhoenix Cloud
  • Documentation
  • Phoenix Cloud
  • Cookbooks
  • Integrations
  • SDK and API Reference
  • Self-Hosting
  • Release Notes
  • Featured Tutorials
  • Agent Cookbooks
  • Agent Demos
  • Agent Workflow Patterns
    • AutoGen
    • CrewAI
    • Google GenAI SDK (Manual Orchestration)
    • OpenAI Agents
    • LangGraph
    • Smolagents
  • Tracing & Annotations
    • Generating Synthetic Datasets for LLM Evaluators & Agents
    • Structured Data Extraction
    • Product Recommendation Agent: Google Agent Engine & LangGraph
    • More Cookbooks
  • Human-in-the-loop Workflows
    • Using Human Annotations for Eval-Driven Development
    • Creating a Custom LLM Evaluator with a Benchmark Dataset
  • Prompt Engineering
    • Prompt Learning - Optimizing Prompts for Classification
    • Few Shot Prompting
    • ReAct Prompting
    • Chain-of-Thought Prompting
    • Prompt Optimization
    • LLM as a Judge Prompt Optimization
  • Evaluation
    • OpenAI Agents SDK Cookbook
    • Evaluate a Talk-to-your-Data Agent
    • Evaluate RAG
    • Code Readability Evaluation
    • Relevance Classification Evaluation
    • Using Ragas to Evaluate a Math Problem-Solving Agent
    • More Cookbooks
  • Datasets & Experiments
    • Experiment with a Customer Support Agent
    • Model Comparison for an Email Text Extraction Service
    • Comparing LlamaIndex Query Engines with a Pairwise Evaluator
    • Prompt Template Iteration for a Summarization Service
    • Text2SQL Experiments
    • More Cookbooks
  • Retrieval & Inferences
    • Embeddings Analysis
    • More Cookbooks
  • Prompt Learning
Powered by GitBook
On this page

Was this helpful?

Edit
  1. Evaluation

More Cookbooks

Use Phoenix Evals to evaluate your application for hallucinations, toxicity, relevance of retrieved documents, and more.

Classification Eval Walkthroughs

  • Hallucination Evals

  • Toxicity Evals

  • Summarization Evals

  • User Frustration Evals

  • Question-Answering Evals

  • Agent Tool Selection Evals

  • Agent Tool Parameter Extraction Evals

  • Agent Tool Calling Evals

  • Reference Link Correctness Evals

  • Ground Truth vs AI Evals

PreviousUsing Ragas to Evaluate a Math Problem-Solving AgentNextExperiment with a Customer Support Agent

Last updated 6 days ago

Was this helpful?

Platform

  • Tracing
  • Prompts
  • Datasets and Experiments
  • Evals

Software

  • Python Client
  • TypeScript Client
  • Phoenix Evals
  • Phoenix Otel

Resources

  • Container Images
  • X
  • Blue Sky
  • Blog

Integrations

  • OpenTelemetry
  • AI Providers

© 2025 Arize AI