Introducing Phoenix Datasets and Experiments

This video introduces Arize Phoenix Datasets and Experiments 🚀, walking through a text-to-SQL use case.

The velocity AI application development is often bottlenecked by high quality evaluations because engineers are often faced with hard tradeoffs: which prompt or LLM best balances performance, latency, and cost. Quality Evaluations are critical as they help answer these types of questions with greater confidence.

🗄 Datasets are collections of Examples. An Example contains Inputs to an AI Task and optionally an expected or reference Output

👩‍🔬 Experiments are Run on Examples to Evaluate if a given Task produces better Outputs.

With arize-phoenix, Datasets are:
🔃 Integrated. Datasets are integrated with the platform, so you can add production spans to datasets, use datasets to run experiments, and use metadata to track different segments and use-cases.
🕰 Versioned. Every insert, update, and delete is versioned, so you can pin experiments and evaluations to a specific version of a dataset and track changes over time.
🧘‍♀️ Flexible. Support for KV, LLM, Chat, OpenAI Ft, OpenAI Evals
✏ Tracked. Dataset examples track their source spans so you always know the source of the data

Experiments build on Datasets. They are:
🕰 Versioned. Every experiment tracks a dataset version📊 Analyzed. Tracks latency, Error Rate, Cost, Scores
🧠 Evaluated. Built-in LLM and code evaluators.
⚡ Blazing Fast optimized for concurrency ⚡️
🕵‍♀️ Explainable. All evals are traced with explanations built-in
⚙ Custom. Custom evals are just functions. Built-in LLM evaluators
🔭 Traced. Traces the internal steps of your tasks and evaluations.

As per usual, Phoenix is fully OSS, 🔐 fully private, and can be self-hosted.

Don't forget to give us a ⭐ to support the project!

Learn more about datasets and experiments with Phoenix in docs.

Arize AX

Learn

Insights

Company

Arize AX

Learn

Insights

Company

Videos

Introducing Phoenix Datasets and Experiments

Getting Started With LLM Evaluation Using Phoenix

Multimodal Query Application: LLM Tracing How-To

Function Calling Agent: Evaluation

Types of LLM Evaluation

Arize AX

Learn

Insights

Company

Videos

Introducing Phoenix Datasets and Experiments

Getting Started With LLM Evaluation Using Phoenix

Multimodal Query Application: LLM Tracing How-To

Function Calling Agent: Evaluation

Types of LLM Evaluation

Subscribe to The Evaluator