Evaluating LLM Changes

In the past 10 days alone we've had 3+ major model releases: GPT-4o-mini, Llama 3.1, and Mistral Large 2. All these new options mean more choices, and more time spent evaluating and testing each model. Fortunately, we have a structured, easy way to experiment with different models on your own LLM app.

This video walks through how you can easily experiment with different models and prompt changes - and compare results side-by-side.

Tools used:
- Arize Phoenix
- OpenAI, Anthropic, Mistral Link to notebook: https://drive.google.com/file/d/1eDQOJ4IRC0phOIUTK5aAx0OL6fb0mcXz/view?usp=sharing

Arize AX

Learn

Insights

Company

Arize AX

Learn

Insights

Company

Videos

Evaluating LLM Changes

Arize AX

Learn

Insights

Company

Videos

Evaluating LLM Changes

Subscribe to The Evaluator