The Evaluator
Your go-to blog for insights on AI observability and evaluation.
Top 5 AI Prompt Management Tools of 2025
Every new AI release rises or falls on how people experience it, and prompts play a major role in shaping that experience. A short sentence can write code, trigger tools,…
Meta AI Researcher Explains ARE and Gaia2: Scaling Up Agent Environments and Evaluations
In our latest paper reading, we had the pleasure of hosting Grégoire Mialon — Research Scientist at Meta Superintelligence Labs — to walk us through Meta AI’s groundbreaking paper titled…
New In Arize AX: Tags, Data Fabric, Automatic Threshold Ranges for Monitors and More
October of 2025 was a crowded month for shipping new features in Arize AX, with updates to make AI agent engineering easier. From a new timeline tab for traces to…
Sign up for our newsletter, The Evaluator — and stay in the know with updates and new resources:
Hyland’s Approach To AI Agent Engineering
Hyland’s AI agent stack pairs Hyland Agent Builder with agentic document processing to bring context-aware agents to core platforms like Onbase, Alfresco, and Nuxeo — turning document understanding into real…
Building the Data Flywheel for Smarter AI Systems with Arize AX and NVIDIA NeMo
Self-driving cars don’t get better by sitting in a lab. They improve by driving millions of miles, capturing edge cases, and feeding that data back into training. Tesla’s fleet generates…
Top LLM Tracing Tools
As of October 2025, 82% of enterprise leaders now rely on generative AI weekly according to a recent report from Wharton and GBK – with three in four seeing positive…
8 Top Prompt Testing and Optimization Tools for LLMs and Multiagent Systems (2025)
If we were to give the year 2025 an AI-appropriate appellation, it would probably be ‘the year of the agents.’ Building atop the startling advances in generative language and image…
ServiceNow’s Tara Bogavelli on AgentArch: Benchmarking AI Agents for Enterprise Workflows
In our latest AI research paper reading, we hosted Tara Bogavelli, Machine Learning Engineer at ServiceNow, to discuss her team’s recent work on AgentArch, a new benchmark designed to evaluate…
OpenAI’s Santosh Vempala Explains Why Language Models Hallucinate
In our latest AI research paper reading, we hosted Santosh Vempala, Professor at Georgia Tech and co-author of OpenAI’s paper, “Why Language Models Hallucinate.” This paper offers one of the…
What Are the Top LLM Evaluation Tools?
AI agents and real-world applications of generative AI are debuting at an incredible clip this year, narrowing the time from AI research paper to industry application and propelling productivity growth…