Videos

Getting Started With LLM Evaluation Using Phoenix

The standard for evaluating text is human labeling. However, human evaluation is often impractical at scale. Evaluating the performance of LLM applications is increasingly handled by using a separate evaluation LLM (LLM as a judge 👩🏾‍⚖️). LLM evaluation is a great starting point for understanding where an LLM application goes wrong. This demo covers running an LLM evaluation using Arize Phoenix, including evals with explanations for Q&A correctness and hallucinations. The Arize Phoenix LLM Evals open source library is designed for simple, fast, and accurate LLM-based evaluations. It leverages a variety of LLM evaluation metrics and tracing.

Dive into the 📓notebook.

Arize AX

Learn

Insights

Company

Arize AX

Learn

Insights

Company

Videos

Getting Started With LLM Evaluation Using Phoenix

Types of LLM Evaluation

AB Testing for LLM Applications

Multimodal Query Application: LLM Tracing How-To

Arize AX

Learn

Insights

Company

Videos

Getting Started With LLM Evaluation Using Phoenix

Types of LLM Evaluation

AB Testing for LLM Applications

Multimodal Query Application: LLM Tracing How-To

Subscribe to The Evaluator