Phoenix
CtrlK
TypeScript APIPython APICommunityGitHubPhoenix Cloud
  • Documentation
  • Self-Hosting
  • SDK and API Reference
  • Integrations
  • Cookbooks
  • Learn
  • Release Notes
  • Featured Tutorials
  • Agent Cookbooks
  • Agent Demos
  • Tracing
    • Cookbooks
    • Structured Data Extraction
  • Prompt Engineering
    • Few Shot Prompting
    • ReAct Prompting
    • Chain-of-Thought Prompting
    • Prompt Optimization
    • LLM as a Judge Prompt Optimization
  • Datasets & Experiments
    • Cookbooks
    • Summarization
    • Text2SQL
  • Evaluation
    • Cookbooks
    • Evaluate RAG
    • Evaluate an Agent
    • OpenAI Agents SDK Cookbook
  • Retrieval & Inferences
    • Cookbooks
    • Embeddings Analysis
Powered by GitBook
On this page

Was this helpful?

Edit on GitHub
Export as PDF
  1. Evaluation

Cookbooks

Leverage the power of large language models to evaluate your generative model or application for hallucinations, toxicity, relevance of retrieved documents, and more.

Cover

LLM Evaluations

Hallucination Evals

Toxicity Evals

Summarization Evals

Cover

Evaluations Use Cases

Retrieved Document Relevance

Code Readability Evals

Question-Answering Evals

Evaluating Agents using Ragas

Cover

Evaluating and Improving RAG Applications

End-to-End RAG Application Evaluation

LlamaIndex Application

LlamaIndex Application using Milvus Vector Store

PreviousText2SQLNextEvaluate RAG

Last updated 27 days ago

Was this helpful?

Platform

  • Tracing
  • Prompts
  • Datasets and Experiments
  • Evals

Software

  • Python Client
  • TypeScript Client
  • Phoenix Evals
  • Phoenix Otel

Resources

  • Container Images
  • X
  • Blue Sky
  • Blog

Integrations

  • OpenTelemetry
  • AI Providers

© 2025 Arize AI