The Evaluator
Your go-to blog for insights on AI observability and evaluation.

LLM-as-a-Judge: Example of How To Build a Custom Evaluator Using a Benchmark Dataset
Arize-Phoenix ships with pre-built evaluators that are tested against benchmark datasets and tuned for repeatability. They’re a fast way to stand up rigorous evaluation for common scenarios. In practice, though,…

ADB Database: Realtime Ingestion At Scale
We put out our first blog on the introducing the Arize database – ADB – in the beginning of July; this blog dives deeper into the realtime ingestion support of…

New In Arize AX: Prompt Learning, Arize Tracing Assistant, and Multiagent Visualization
July was a big month for Arize AX, with updates to make AI and agent engineering much easier. From prompt learning to new skills for Alyx and OpenInference Java, there…
Sign up for our newsletter, The Evaluator — and stay in the know with updates and new resources:

A Watermark for Large Language Models
In our latest live AI research papers community reading, the primary author of the popular paper A Watermark For Large Language Models (John Kirchenbauer of University of Maryland) walked us…

Unlocking Safer AI: Your Two-Part Field Guide
Large language models are reshaping how we build products — and how adversaries try to break them. To help teams stay ahead, Sofia Jakovcevic — AI Solutions Engineer at Arize…

LLM Observability for AI Agents and Applications
The era of single-turn LLM calls is behind us. Today’s AI products are powered by increasingly autonomous agents — multi-step systems that plan, reason, use tools, and adapt in real…

Prompt Learning: Using English Feedback to Optimize LLM Systems
Applications of reinforcement learning (RL) in AI model building has been a growing topic over the past few months. From Deepseek models incorporating RL mechanics into their training processes to…

Self-Adapting Language Models: Paper Authors Discuss Implications
In a recent live AI research paper reading, the authors of the new paper Self-Adapting Language Models (SEAL) shared a behind-the-scenes look at their work, motivations, results, and future directions….

Meet Alyx: Arize’s Evolving AI Agent
We’re excited to introduce Alyx, the next evolution in Arize’s intelligent assistant. You might remember our first iteration — Copilot — launched last year as a set of tools to…

Introducing ADB: Arize’s Proprietary OLAP Database
Earlier this month, we rolled out real‑time ingestion support to every Arize AX workspace—paid and free. With that launch, Arize now ingests terabytes of data every day across hundreds of…