The Evaluator
Your go-to blog for insights on AI observability and evaluation.
One agent, two trace destinations: Arize AX + Databricks Unity Catalog
Send one OpenTelemetry trace stream to both Arize AX and Databricks Unity Catalog so engineers can debug agents in Arize while data teams analyze the same spans in governed lakehouse storage.
Memory is still a missing primitive: Cataloguing what the field is actually shipping
This week the field shipped four kinds of memory, and Apple paid Google a billion dollars a year for one of them. None of the four is what the demos imply. A field map of what’s actually shipping, and the missing primitive that sits between the buckets.
Bring production agent traces from Arize into Databricks Unity Catalog
Arize Data Fabric now supports Databricks, helping teams sync production agent traces, evaluations, and annotations into customer-owned storage for governed analysis in Unity Catalog.
Sign up for our newsletter, The Evaluator — and stay in the know with updates and new resources:
PostgresFS vs. SQL skills: should AI agents fake a filesystem?
Can an AI agent use a database as if it were a filesystem? Arize compared a Postgres-backed filesystem abstraction with a SQL skill and found that locality, accuracy, and maintenance cost favored the skill-based approach.
How Arize built AI-native support workflows that cut resolution time in half
Arize reduced median support resolution time from 22 hours to roughly 2.5 hours by building AI-native internal workflows for context gathering, debugging, escalation, and continuous improvement.
How to detect credential theft in AI agent harness traces
In May 2026, a malicious version of a popular VS Code extension spent 18 minutes in the marketplace before anyone caught it. In that time it ran on roughly 6,000…
Phoenix at 10,000 stars on GitHub: How an open source AI observability project grew by following its community
Phoenix crossed 10,000 GitHub stars. Here is how the open-source AI observability project grew from a Jupyter notebook extension into a community-shaped platform for traces, evals, OpenInference, and agents.
Building the AI factory for self-improving agents: What’s new in Arize AX
Arize AX is adding managed agents, full-agent experimentation, expanded multimodal support, and Harness-as-a-Judge to help teams observe, evaluate, and improve production agents.
Microsoft’s open trust stack runs on OpenInference
Microsoft’s open trust stack for AI agents puts ASSERT and Agent Control Specification on top of OpenInference, connecting evaluation, runtime controls, and observability through a shared trace contract.
The end of fine-tuning: Why evals, context, and traces matter more
Fine-tuning isn’t dead, but the way most teams iterate on AI products has split in two. A tiny fraction run continuous RL against their own environments; everyone else has moved the iteration loop out of the model and into the harness. Here’s why, and what the 99% should do instead.