This tutorial follows examples illustrated in this blog:
Blog: Evaluating and Improving AI Agents at Scale with Microsoft Foundry
1. Azure AI Foundry and Arize for Agent Observability and Evaluation
This notebook demonstrates how to:- Build a LangChain multi-chain agent on Azure AI Foundry while tracing all operations to Arize for observability
- Leverage Microsoft Risk and Safety Evaluators to evaluate LLM behavior
- Log evaluation results to Arize for visibility
Notebook Tutorial - Foundry Agent Observability and Evaluation
Screenshot showing Microsoft hate and unfairness evaluation metric attached to a span.
Screenshot showing summarized dashboard with key observability metrics and evaluation KPI metrics
2. Azure Risk and Safety Evaluators on Arize Datasets+Experiments
This notebook demonstrates how to leverage Azure Risk and Safety Evaluators with Arize Datasets+Experiments to track and visualize experiments and evaluations in the Arize. We will use the Hate Unfairness Evaluator to evaluate the output an Azure AI Foundry agent.Notebook Tutorial - Using Foundry Evaluators on Arize Datasets + Experiments
Screenshot showing row level comparison of experiment runs in Arize AX with hate and unfairness scores, labels and explanations.
