The Evaluator
Your go-to blog for insights on AI observability and evaluation.
Why AI Agents Break: A Field Analysis of Production Failures
As AI agents enter production environments, they face conditions their training does not cover. These systems generate fluent output, yet operational work demands exact action. Small ambiguities compound fast when…
OWASP Top 10 for Agentic Applications: Compliance Guide
This guide maps the OWASP Agentic Security Initiative (ASI) top ten risks to specific Arize AX observability features and metrics you should implement to detect, monitor, and mitigate threats in…
Hierarchical Memory Management In Agent Harnesses
We’ve worked with thousands of customers building AI agents, and we’ve also spent the last two years building our own agent, Alyx, an in-product assistant for Arize AX. These experiences…
Sign up for our newsletter, The Evaluator — and stay in the know with updates and new resources:
How Observability-Driven Sandboxing Secures AI Agents
AI agents become dangerous at the moment they gain the ability to execute actions. The moment an agent can touch the file system or invoke external tools, safety shifts from…
AI Agent interfaces In 2026: Filesystem vs API vs Database (What Actually Works)
We Don’t Know How to Build Agent Interfaces Yet (And That’s Fine) Letta just published benchmark results showing a filesystem-based agent scored 74% on memory tasks by simply storing conversation…
Google Antigravity and Arize AX’s MCP Tracing Assistant: How to Trace Your Agent Without Writing Any Code
TL;DR: Add the Arize AX MCP server to Antigravity to instrument your AI applications without leaving your IDE. Instrumenting AI applications with tracing and observability is critical for debugging, monitoring,…
How Context Graphs Turn Agent Traces Into Durable Business Assets
In their recent essay making the rounds, Foundation Capital’s Jaya Gupta and Ashu Garg argue that the next enterprise data advantage will come from capturing decision traces and stitching them…
New In Arize AX: Multi-Span Filters and Improved Playground Views
Arize AX released a raft of new updates to close out December of 2025. From improved playground views to multi-span filters, here’s are some highlights. Multi-Span Filters Filter traces using…
EU AI Act Compliance: What AI Engineering Teams Should Monitor
The EU AI Act is no longer a distant regulatory concept; it is in force and enterprises are road testing their real-world implementation. The core law is Regulation (EU) 2024/1689,…
How TheFork Leverages Online Evals To Boost Conversions with Arize AX on AWS
TheFork is one of Europe’s leading restaurant discovery and booking platforms, connecting millions of diners with tens of thousands of restaurants across major cities. The company’s marketplace spans everything from…