Arize AI for Agents

Arize AI is a powerful AI engineering platform designed to support the development, evaluation, and observability of AI agents. Arize helps developers create robust, high performing agents.

It has first-class support for agent frameworks such as Autogen, OpenAI-agents, LangGraph, and smolagents.

Why Arize AI for Agents?

1. Agent Observability with Auto Instrumentation

Observability is critical for understanding how agents behave in real-world scenarios. Arize AI provides robust tracing capabilities through our open source OpenInference library, automatically instrumenting your agent applications to capture traces and spans. This includes LLM calls, tool invocations, and data retrieval steps, giving you a detailed view of your agent's workflow.

With just a few lines of code, you can set up tracing for popular frameworks like OpenAI Agents, LangGraph, and Autogen. Learn more about Tracing.

Code Example: Auto Instrumentation for OpenAI Agents

from arize.otel import register

tracer_provider = register(
    space_id = "your-space-id",
    api_key = "your-api-key",
    project_name="agents"
)

from openinference.instrumentation.openai_agents import OpenAIAgentsInstrumentor
OpenAIAgentsInstrumentor().instrument(tracer_provider=tracer_provider)

2. Agent Evaluations with Online Evals

Evaluating agent performance is essential to ensure reliability and accuracy. Arize AI's online evaluations automatically tag spans with performance labels, helping you identify problematic interactions and measure key metrics.

  • Comprehensive Evaluation Templates: Arize provides templates for evaluating various agent components, such as Tool Calling, Path Convergence, and Planning.

  • Online Evals: With Online Evals, you can run continuous evaluations on production data to monitor correctness, hallucination, relevance, and latency. This ensures your agents perform consistently across diverse scenarios.

  • Custom Metrics and Alerts: Track key metrics on custom dashboards and receive alerts when performance deviates from the norm, allowing proactive optimization of agent behavior.

Code Example: Logging Evaluations to Arize

# Example of logging an evaluation for an agent's response
from arize.api import Client
from arize.utils.types import Environments, ModelTypes

arize_client = Client(space_key="YOUR_SPACE_KEY", api_key="YOUR_API_KEY")
response = arize_client.log_evaluation(
    model_id="agent-model-v1",
    environment=Environments.PRODUCTION,
    model_type=ModelTypes.GENERATIVE_LLM,
    prompt="Plan a trip to Paris.",
    response="Here is a 5-day itinerary for Paris...",
    evaluation_name="Correctness",
    evaluation_score=0.9
)

3. Testing Agents in Prompt Playground with Tool Calling Support

Arize's Prompt Playground is a no-code environment for iterating on prompts and testing agent behaviors, including support for tool calling—a critical feature for agents that interact with external APIs or functions.

  • Iterate on Prompts: Test different prompt templates, models, and parameters side by side to refine how your agent responds to user inputs.

  • Tool Calling Support: Debug tool calling directly in the Playground to ensure your agent selects the right tools and parameters. Learn more about Using Tools in Playground.

  • Save as Experiment: Run systematic A/B tests on datasets to validate agent performance and share results with your team via experiments.

4. Sessions for Agent Interaction Tracking

For chatbot or multi-turn agent applications, tracking sessions is invaluable for debugging and performance analysis. Arize AI supports session tracking to group traces based on interactions.

  • Session ID and User ID: Add session.id and user.id as attributes to spans to group interactions and analyze conversation flows. This helps identify where conversations break or user frustration increases.

  • Debugging Sessions: Use the Arize platform to filter sessions and find underperforming groups of traces. Learn more about Sessions and Users.

Code Example: Adding Session ID for Agent Chatbot

from openinference.instrumentation import using_session

with using_session(session_id="chat-session-456"):
    # Agent interaction within a session
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": "Book a flight to Paris."}],
        max_tokens=50,
    )

5. Agent Replay and Agent Pathing (Coming Soon)

  • Agent Replay: Replay agent interactions to debug agent tool calling in a controlled environment. Replay will help you simulate past sessions to test improvements without impacting live users.

  • Agent Pathing: Analyze and optimize the pathways your agents take to complete tasks. Understand whether agents are taking efficient routes or getting stuck in loops, with tools to refine planning and convergence strategies.

Additional Resources for Agent Development

Last updated

Was this helpful?