The Evaluator
Your go-to blog for insights on AI observability and evaluation.
How To Improve AI Agent Security with Microsoft’s AI Red Teaming Agent in Microsoft Foundry
Building safe AI isn’t optional anymore. Every model deployed to production faces adversarial users trying to make it behave badly. Microsoft Foundry gives you automated red teaming – essentially a…
Evaluating and Improving AI Agents at Scale with Microsoft Foundry
The Case for Continuous AI Quality As generative and agentic systems mature, the question for enterprises is no longer simply “can we build it?” It is “can we trust it?”….
GEPA vs Prompt Learning: Benchmarking Different Prompt Optimization Approaches
In June 2025, Andrej Karpathy introduced Software 3.0: the notion that software development is shifting from programming through code to prompting through natural language. When building programs, the goal is…
Sign up for our newsletter, The Evaluator — and stay in the know with updates and new resources:
Tracing, Evaluation, and Observability for Google ADK (How To)
Multi-agent systems are moving from research prototypes to production deployments. But there’s a gap between “it works in the demo” and “it works reliably at scale.” Google’s Agent Development Kit…
Top 5 AI Prompt Management Tools of 2025
Every new AI release rises or falls on how people experience it, and prompts play a major role in shaping that experience. A short sentence can write code, trigger tools,…
Meta AI Researcher Explains ARE and Gaia2: Scaling Up Agent Environments and Evaluations
In our latest paper reading, we had the pleasure of hosting Grégoire Mialon — Research Scientist at Meta Superintelligence Labs — to walk us through Meta AI’s groundbreaking paper titled…
New In Arize AX: Tags, Data Fabric, Automatic Threshold Ranges for Monitors and More
October of 2025 was a crowded month for shipping new features in Arize AX, with updates to make AI agent engineering easier. From a new timeline tab for traces to…
Hyland’s Approach To AI Agent Engineering
Hyland’s AI agent stack pairs Hyland Agent Builder with agentic document processing to bring context-aware agents to core platforms like Onbase, Alfresco, and Nuxeo — turning document understanding into real…
Building the Data Flywheel for Smarter AI Systems with Arize AX and NVIDIA NeMo
Self-driving cars don’t get better by sitting in a lab. They improve by driving millions of miles, capturing edge cases, and feeding that data back into training. Tesla’s fleet generates…
Top LLM Tracing Tools
As of October 2025, 82% of enterprise leaders now rely on generative AI weekly according to a recent report from Wharton and GBK – with three in four seeing positive…