The Evaluator
Your go-to blog for insights on AI observability and evaluation.

Accurate KV Cache Quantization with Outlier Tokens Tracing
Deploying large language models (LLMs) at scale is expensive—especially during inference. One of the biggest memory and performance bottlenecks? The KV Cache. In a new research paper, Accurate KV Cache…

New in Arize: Realtime Trace Ingestion, Prompt Playground Upgrades & More
In May, we expanded access to realtime trace ingestion across all Arize AX tiers, making it easier than ever to monitor LLM performance live. We also rolled out major usability…

Harnessing Databricks Mosaic AI Agent Framework and Arize for Next-Level GenAI Applications
Co-authored by Prasad Kona, Lead Partner Solutions Architect at Databricks Building production-ready AI agents that can reliably handle complex tasks remains one of the biggest challenges in generative AI today….
Sign up for our newsletter, The Evaluator — and stay in the know with updates and new resources:

Arize AI Now Generally Available As Part of Azure Native Integrations
Arize AI, a leading platform for AI observability and LLM evaluation, today announced the general availability of its platform to developers as part of Azure Native Integrations. The debut follows…

Arize AI Accelerates Enterprise AI Adoption On-Premises With NVIDIA
Arize AI, a leader in large language model (LLM) evaluation and AI observability, today announced it is delivering a high-performance, on-premises AI for enterprises seeking to deploy and scale AI…

Scalable Chain of Thoughts via Elastic Reasoning
This paper introduces Elastic Reasoning, a novel framework designed to enhance the efficiency and scalability of large reasoning models (LRMs) by explicitly separating the reasoning process into two distinct phases:…

Sleep-time Compute: Beyond Inference Scaling at Test-time
We recently discussed “Sleep Time Compute: Beyond Inference Scaling at Test Time,” new research from the team at Letta. The paper addresses a key challenge in using powerful AI models:…

New in Arize: Bigger Datasets, Better Evaluations, and Expanded CV Support
April was a big month for Arize, with updates designed to make building, evaluating, and managing your models and prompts even easier. From larger dataset runs in Prompt Playground to…

Integrating Arize AI and Amazon Bedrock Agents: A Comprehensive Guide to Tracing, Evaluation, and Monitoring
In today’s rapidly evolving AI landscape, effective observability into agent systems has become a critical requirement for enterprise applications. This technical guide explores the newly announced integration between Arize AI…

LibreEval: A Smarter Way to Detect LLM Hallucinations
Over the past few weeks, the Arize team has generated the largest public dataset of hallucinations, as well as a series of fine-tuned evaluation models. We wanted to create a…