Phoenix: Text-To-SQL Tutorial

This video introduces Arize Phoenix Datasets and Experiments 🚀, walking through a text-to-SQL use case.

The velocity AI application development is often bottlenecked by high quality evaluations because engineers are often faced with hard tradeoffs: which prompt or LLM best balances performance, latency, and cost. Quality Evaluations are critical as they help answer these types of questions with greater confidence.

🗄 Datasets are collections of Examples. An Example contains Inputs to an AI Task and optionally an expected or reference Output

👩‍🔬 Experiments are Run on Examples to Evaluate if a given Task produces better Outputs.

With arize-phoenix, Datasets are:
🔃 Integrated. Datasets are integrated with the platform, so you can add production spans to datasets, use datasets to run experiments, and use metadata to track different segments and use-cases.
🕰 Versioned. Every insert, update, and delete is versioned, so you can pin experiments and evaluations to a specific version of a dataset and track changes over time.
🧘‍♀️ Flexible. Support for KV, LLM, Chat, OpenAI Ft, OpenAI Evals
✏ Tracked. Dataset examples track their source spans so you always know the source of the data

Experiments build on Datasets. They are:
🕰 Versioned. Every experiment tracks a dataset version📊 Analyzed. Tracks latency, Error Rate, Cost, Scores
🧠 Evaluated. Built-in LLM and code evaluators.
⚡ Blazing Fast optimized for concurrency ⚡️
🕵‍♀️ Explainable. All evals are traced with explanations built-in
⚙ Custom. Custom evals are just functions. Built-in LLM evaluators
🔭 Traced. Traces the internal steps of your tasks and evaluations.

As per usual, Phoenix is fully OSS, 🔐 fully private, and can be self-hosted.

Don't forget to give us a ⭐ to support the project!

Learn more about datasets and experiments with Phoenix: https://docs.arize.com/phoenix/datasets-and-experiments/overview-datasets Phoenix: https://phoenix.arize.com/

Subscribe to our resources and blogs

Subscribe