Webinar

LLM Evaluations: SQL Generation and Router-Based Architectures

  May 9th & 16th

  10:00am PST – 10:45am PST

Virtual

Join Arize AI’s Co-Founders for a virtual event dedicated to exploring the latest frontiers in evaluating large language models (LLMs) for complex tasks. This event will feature two insightful sessions, each delving into a unique and exciting application of LLM evaluation:

Session 1 | SQL Generation Evals: LLMs-as-a-Judge

LLM-as-a-Judge is a popular and scalable technique to evaluate LLMs for tasks including toxicity classification, sentiment classifier, and text-to-SQL tasks. However, LLM-as-a-Judge based evaluation has certain limitations and points of contention – circular methodology (using 1 LLM to evaluate another LLM) and disregard for database schema or distribution. In this session, we will discuss an experiment we designed to evaluate the performance of the LLM-as-a-Judge Eval for text-to-SQL tasks. We’ll take you through a framework to compare LLM-as-a-Judge approach with a data distribution-based Eval approach for text-to-SQL tasks. We will also discuss some interesting cases that came up in our research highlighting the pitfalls of LLM-as-a-Judge approach and some suggestions on how this approach can be enhanced to account for those limitations.

Session 2 | LLM Evals for Router-Based Architectures

The second session in our series will explore how to effectively evaluate large language models (LLMs) within router-based AI architectures. Router networks allow for the dynamic routing of inputs to specialized LLM components, enabling more efficient and capable systems. However, evaluating the performance of these complex architectures presents unique challenges. In this session, we’ll cover key considerations and best practices for LLM evaluation in router setups.

Save Your Spot

Speakers

Jason Lopatecki
Co-Founder & CEO, Arize AI

Manas Singh
MBA Candidate, UC Berkeley

Aparna Dhinakaran
Co-Founder & CPO, Arize AI

Dat Ngo
Strategic Solutions Architect

Get ML observability in minutes.

Get Started