1 of 5

Google

Google GenAI is a suite of AI tools and models from Google Cloud, designed to help businesses build, deploy, and scale AI applications.

Website: https://cloud.google.com/docs/generative-ai

Google Gen AI Tracing

Instrument LLM calls made using the Google Gen AI Python SDK

Launch Phoenix

Install

Setup

Set the GEMINI_API_KEY environment variable. To use the Gen AI SDK with Vertex AI instead of the Developer API, refer to Google's on setting the required environment variables.

Use the register function to connect your application to Phoenix.

Observe

Now that you have tracing setup, all Gen AI SDK requests will be streamed to Phoenix for observability and evaluation.

This instrumentation will support tool calling soon. Refer to for the status.

Google ADK Tracing

Instrument LLM calls made using the Google ADK Python SDK

Launch Phoenix

Install

pip install openinference-instrumentation-google-adk google-adk arize-phoenix-otel

Setup

Set the GOOGLE_API_KEY environment variable. Refer to Google's ADK documentation for more details on authentication and environment variables.

export GOOGLE_API_KEY=[your_key_here]

Use the register function to connect your application to Phoenix.

from phoenix.otel import register

# Configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Observe

Now that you have tracing setup, all Google ADK SDK requests will be streamed to Phoenix for observability and evaluation.

import asyncio

from google.adk.agents import Agent
from google.adk.runners import InMemoryRunner
from google.genai import types

def get_weather(city: str) -> dict:
    """Retrieves the current weather report for a specified city.

    Args:
        city (str): The name of the city for which to retrieve the weather report.

    Returns:
        dict: status and result or error msg.
    """
    if city.lower() == "new york":
        return {
            "status": "success",
            "report": (
                "The weather in New York is sunny with a temperature of 25 degrees"
                " Celsius (77 degrees Fahrenheit)."
            ),
        }
    else:
        return {
            "status": "error",
            "error_message": f"Weather information for '{city}' is not available.",
        }

agent = Agent(
   name="test_agent",
   model="gemini-2.0-flash-exp",
   description="Agent to answer questions using tools.",
   instruction="You must use the available tools to find an answer.",
   tools=[get_weather]
)

async def main():
    app_name = "test_instrumentation"
    user_id = "test_user"
    session_id = "test_session"
    runner = InMemoryRunner(agent=agent, app_name=app_name)
    session_service = runner.session_service
    await session_service.create_session(
        app_name=app_name,
        user_id=user_id,
        session_id=session_id
    )
    async for event in runner.run_async(
        user_id=user_id,
        session_id=session_id,
        new_message=types.Content(role="user", parts=[
            types.Part(text="What is the weather in New York?")]
        )
    ):
        if event.is_final_response():
            print(event.content.parts[0].text.strip())

if __name__ == "__main__":
    asyncio.run(main())

Refer to this page for the latest status of the OpenInference Google ADK Instrumentation.

Resources:

OpenInference Package

Gemini Evals

Configure and run Gemini for evals

GeminiModel

To authenticate with Gemini, you must pass either your credentials or a project, location pair. In the following example, we quickly instantiate the Gemini model as follows:

Google Gen AI Evals

Evaluate multi-agent systems using Arize Phoenix, Google Evals, and CrewAI

Overview

This guide demonstrates how to evaluate multi-agent systems using Arize Phoenix, Google Gen AI Evals, and CrewAI. It shows how to:

Set up a multi-agent system using CrewAI for collaborative AI agents
Instrument the agents with Phoenix for tracing and monitoring
Evaluate agent performance and interactions using Google GenAI
Analyze the results using Arize Phoenix's observability platform

Key Technologies

CrewAI: For orchestrating multi-agent systems
Arize Phoenix: For observability and tracing
Google Cloud Vertex AI: For model hosting and execution
OpenAI: For agent LLM capabilities

We will walk through the key steps in the documentation below. Check out the full tutorial here:

Define your CrewAI Crew of Agents & Tasks

This crew consists of specialized agents working together to analyze and report on a given topic.

from crewai import Agent, Crew, Process, Task

#Define agents here (see full tutorial) 

# Create tasks for your agents with explicit context
    conduct_analysis_task = Task(
        description=f"""Conduct a comprehensive analysis of the latest developments in {topic}.
      Identify key trends, breakthrough technologies, and potential industry impacts.
      Focus on both research breakthroughs and commercial applications.""",
        expected_output="Full analysis report in bullet points with citations to sources",
        agent=researcher,
        context=[],  # Explicitly set empty context
    )

    fact_checking_task = Task(
        description=f"""Review the research findings and verify the accuracy of claims about {topic}.
      Identify any potential ethical concerns or societal implications.
      Highlight areas where hype may exceed reality and provide a balanced assessment.
      Suggest frameworks that should be considered for each major advancement.""",
        expected_output="Fact-checking report with verification status for each major claim",
        agent=fact_checker,
        context=[conduct_analysis_task],  # Set context to previous task
    )

    # Instantiate your crew with a sequential process
    crew = Crew(
        agents=[researcher, fact_checker, writer],
        tasks=[conduct_analysis_task, fact_checking_task, writer_task],
        verbose=False,
        process=Process.sequential,
    )

    return crew

Evaluating Agents using Google Gen AI

Next, you'll built an experiment set to test your CrewAI Crew with Phoenix and Google Gen AI evals.

When run, an Experiment will send each row of your dataset through your task, then apply each of your evaluators to the result.

All traces and metrics will then be stored in Phoenix for reference and comparison.

Create Dataset in Phoenix

phoenix_client = px.Client()
try:
    dataset = phoenix_client.get_dataset(name="crewai-researcher-test-topics")
except ValueError:
    dataset = phoenix_client.upload_dataset(
        dataframe=df,
        dataset_name="crewai-researcher-test-topics",
        input_keys=["topic"],
        output_keys=["reference_trajectory"],
    )

Define your Experiment Task

This method will be run on each row of your test cases dataset:

def call_crew_with_topic(input):
    crew = create_research_crew(topic=input.get("topic"))
    result = crew.kickoff()
    return result

Define your Evaluators

Define as many evaluators as you'd need to evaluate your agent. In this case, you'll use Google Gen AI's eval library to evaluate the crew's trajectory.

from vertexai.preview.evaluation import EvalTask

def eval_trajectory_with_google_gen_ai(
    output, expected, metric_name="trajectory_exact_match"
) -> float:
    eval_dataset = pd.DataFrame(
        {
            "predicted_trajectory": [create_trajectory_from_response(output)],
            "reference_trajectory": [expected.get("reference_trajectory")],
        }
    )
    eval_task = EvalTask(
        dataset=eval_dataset,
        metrics=[metric_name],
    )
    eval_result = eval_task.evaluate()
    metric_value = eval_result.summary_metrics.get(f"{metric_name}/mean")
    if metric_value is None:
        return 0.0
    return metric_value


def trajectory_exact_match(output, expected):
    return eval_trajectory_with_google_gen_ai(
        output, expected, metric_name="trajectory_exact_match"
    )


def trajectory_precision(output, expected):
    return eval_trajectory_with_google_gen_ai(
        output, expected, metric_name="trajectory_precision"
    )

Run your Experiment and Visualize Results in Phoenix

import nest_asyncio
from phoenix.experiments import run_experiment

nest_asyncio.apply()

experiment = run_experiment(
    dataset,
    call_crew_with_topic,
    experiment_name="agent-experiment",
    evaluators=[
        trajectory_exact_match,
        trajectory_precision,
        trajectory_in_order_match,
        trajectory_any_order_match,
        agent_names_match,
    ],
)

Copyright 2024 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and
limitations under the License.

Getting Started with Vertex AI Python SDK for Gen AI Evaluation Service

Google ADK Tracing

Instrument LLM calls made using the Google ADK Python SDK

Launch Phoenix

Install

pip install openinference-instrumentation-google-adk google-adk arize-phoenix-otel

Setup

Set the GOOGLE_API_KEY environment variable. Refer to Google's ADK documentation for more details on authentication and environment variables.

export GOOGLE_API_KEY=[your_key_here]

Use the register function to connect your application to Phoenix.

from phoenix.otel import register

# Configure the Phoenix tracer
tracer_provider = register(
  project_name="my-llm-app", # Default is 'default'
  auto_instrument=True # Auto-instrument your app based on installed OI dependencies
)

Observe

Now that you have tracing setup, all Google ADK SDK requests will be streamed to Phoenix for observability and evaluation.

import asyncio

from google.adk.agents import Agent
from google.adk.runners import InMemoryRunner
from google.genai import types

def get_weather(city: str) -> dict:
    """Retrieves the current weather report for a specified city.

    Args:
        city (str): The name of the city for which to retrieve the weather report.

    Returns:
        dict: status and result or error msg.
    """
    if city.lower() == "new york":
        return {
            "status": "success",
            "report": (
                "The weather in New York is sunny with a temperature of 25 degrees"
                " Celsius (77 degrees Fahrenheit)."
            ),
        }
    else:
        return {
            "status": "error",
            "error_message": f"Weather information for '{city}' is not available.",
        }

agent = Agent(
   name="test_agent",
   model="gemini-2.0-flash-exp",
   description="Agent to answer questions using tools.",
   instruction="You must use the available tools to find an answer.",
   tools=[get_weather]
)

async def main():
    app_name = "test_instrumentation"
    user_id = "test_user"
    session_id = "test_session"
    runner = InMemoryRunner(agent=agent, app_name=app_name)
    session_service = runner.session_service
    await session_service.create_session(
        app_name=app_name,
        user_id=user_id,
        session_id=session_id
    )
    async for event in runner.run_async(
        user_id=user_id,
        session_id=session_id,
        new_message=types.Content(role="user", parts=[
            types.Part(text="What is the weather in New York?")]
        )
    ):
        if event.is_final_response():
            print(event.content.parts[0].text.strip())

if __name__ == "__main__":
    asyncio.run(main())

Refer to this page for the latest status of the OpenInference Google ADK Instrumentation.

Resources:

OpenInference Package

Sign up for Phoenix:

Sign up for an Arize Phoenix account at https://app.phoenix.arize.com/login
Click Create Space, then follow the prompts to create and launch your space.

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint and API Key:

From your new Phoenix Space

Create your API key from the Settings page
Copy your Hostname from the Settings page
In your code, set your endpoint and API key:

import os

os.environ["PHOENIX_API_KEY"] = "ADD YOUR PHOENIX API KEY"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "ADD YOUR PHOENIX HOSTNAME"

# If you created your Phoenix Cloud instance before June 24th, 2025,
# you also need to set the API key as a header:
# os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.getenv('PHOENIX_API_KEY')}"

Having trouble finding your endpoint? Check out Finding your Phoenix Endpoint

Launch your local Phoenix instance:

pip install arize-phoenix
phoenix serve

For details on customizing a local terminal deployment, see Terminal Setup.

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint:

import os

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://localhost:6006"

See Terminal for more details.

Pull latest Phoenix image from Docker Hub:

docker pull arizephoenix/phoenix:latest

Run your containerized instance:

docker run -p 6006:6006 arizephoenix/phoenix:latest

This will expose the Phoenix on localhost:6006

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint:

import os

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://localhost:6006"

For more info on using Phoenix with Docker, see Docker.

Install packages:

pip install arize-phoenix

Launch Phoenix:

import phoenix as px
px.launch_app()

By default, notebook instances do not have persistent storage, so your traces will disappear after the notebook is closed. See self-hosting or use one of the other deployment options to retain traces.

Sign up for Phoenix:

Sign up for an Arize Phoenix account at https://app.phoenix.arize.com/login
Click Create Space, then follow the prompts to create and launch your space.

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint and API Key:

From your new Phoenix Space

Create your API key from the Settings page
Copy your Hostname from the Settings page
In your code, set your endpoint and API key:

import os

os.environ["PHOENIX_API_KEY"] = "ADD YOUR PHOENIX API KEY"
os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "ADD YOUR PHOENIX HOSTNAME"

# If you created your Phoenix Cloud instance before June 24th, 2025,
# you also need to set the API key as a header:
# os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.getenv('PHOENIX_API_KEY')}"

Having trouble finding your endpoint? Check out Finding your Phoenix Endpoint

Launch your local Phoenix instance:

pip install arize-phoenix
phoenix serve

For details on customizing a local terminal deployment, see Terminal Setup.

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint:

import os

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://localhost:6006"

See Terminal for more details.

Pull latest Phoenix image from Docker Hub:

docker pull arizephoenix/phoenix:latest

Run your containerized instance:

docker run -p 6006:6006 arizephoenix/phoenix:latest

This will expose the Phoenix on localhost:6006

Install packages:

pip install arize-phoenix-otel

Set your Phoenix endpoint:

import os

os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "http://localhost:6006"

For more info on using Phoenix with Docker, see Docker.

Install packages:

pip install arize-phoenix

Launch Phoenix:

import phoenix as px
px.launch_app()

Google Gen AI Evals

Evaluate multi-agent systems using Arize Phoenix, Google Evals, and CrewAI

Overview

This guide demonstrates how to evaluate multi-agent systems using Arize Phoenix, Google Gen AI Evals, and CrewAI. It shows how to:

Set up a multi-agent system using CrewAI for collaborative AI agents
Instrument the agents with Phoenix for tracing and monitoring
Evaluate agent performance and interactions using Google GenAI
Analyze the results using Arize Phoenix's observability platform

Key Technologies

CrewAI: For orchestrating multi-agent systems
Arize Phoenix: For observability and tracing
Google Cloud Vertex AI: For model hosting and execution
OpenAI: For agent LLM capabilities

We will walk through the key steps in the documentation below. Check out the full tutorial here:

Define your CrewAI Crew of Agents & Tasks

This crew consists of specialized agents working together to analyze and report on a given topic.

from crewai import Agent, Crew, Process, Task

#Define agents here (see full tutorial) 

# Create tasks for your agents with explicit context
    conduct_analysis_task = Task(
        description=f"""Conduct a comprehensive analysis of the latest developments in {topic}.
      Identify key trends, breakthrough technologies, and potential industry impacts.
      Focus on both research breakthroughs and commercial applications.""",
        expected_output="Full analysis report in bullet points with citations to sources",
        agent=researcher,
        context=[],  # Explicitly set empty context
    )

    fact_checking_task = Task(
        description=f"""Review the research findings and verify the accuracy of claims about {topic}.
      Identify any potential ethical concerns or societal implications.
      Highlight areas where hype may exceed reality and provide a balanced assessment.
      Suggest frameworks that should be considered for each major advancement.""",
        expected_output="Fact-checking report with verification status for each major claim",
        agent=fact_checker,
        context=[conduct_analysis_task],  # Set context to previous task
    )

    # Instantiate your crew with a sequential process
    crew = Crew(
        agents=[researcher, fact_checker, writer],
        tasks=[conduct_analysis_task, fact_checking_task, writer_task],
        verbose=False,
        process=Process.sequential,
    )

    return crew

Evaluating Agents using Google Gen AI

Next, you'll built an experiment set to test your CrewAI Crew with Phoenix and Google Gen AI evals.

When run, an Experiment will send each row of your dataset through your task, then apply each of your evaluators to the result.

All traces and metrics will then be stored in Phoenix for reference and comparison.

Create Dataset in Phoenix

phoenix_client = px.Client()
try:
    dataset = phoenix_client.get_dataset(name="crewai-researcher-test-topics")
except ValueError:
    dataset = phoenix_client.upload_dataset(
        dataframe=df,
        dataset_name="crewai-researcher-test-topics",
        input_keys=["topic"],
        output_keys=["reference_trajectory"],
    )

Define your Experiment Task

This method will be run on each row of your test cases dataset:

def call_crew_with_topic(input):
    crew = create_research_crew(topic=input.get("topic"))
    result = crew.kickoff()
    return result

Define your Evaluators

Define as many evaluators as you'd need to evaluate your agent. In this case, you'll use Google Gen AI's eval library to evaluate the crew's trajectory.

from vertexai.preview.evaluation import EvalTask

def eval_trajectory_with_google_gen_ai(
    output, expected, metric_name="trajectory_exact_match"
) -> float:
    eval_dataset = pd.DataFrame(
        {
            "predicted_trajectory": [create_trajectory_from_response(output)],
            "reference_trajectory": [expected.get("reference_trajectory")],
        }
    )
    eval_task = EvalTask(
        dataset=eval_dataset,
        metrics=[metric_name],
    )
    eval_result = eval_task.evaluate()
    metric_value = eval_result.summary_metrics.get(f"{metric_name}/mean")
    if metric_value is None:
        return 0.0
    return metric_value


def trajectory_exact_match(output, expected):
    return eval_trajectory_with_google_gen_ai(
        output, expected, metric_name="trajectory_exact_match"
    )


def trajectory_precision(output, expected):
    return eval_trajectory_with_google_gen_ai(
        output, expected, metric_name="trajectory_precision"
    )

Run your Experiment and Visualize Results in Phoenix

import nest_asyncio
from phoenix.experiments import run_experiment

nest_asyncio.apply()

experiment = run_experiment(
    dataset,
    call_crew_with_topic,
    experiment_name="agent-experiment",
    evaluators=[
        trajectory_exact_match,
        trajectory_precision,
        trajectory_in_order_match,
        trajectory_any_order_match,
        agent_names_match,
    ],
)

Copyright 2024 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and
limitations under the License.

Getting Started with Vertex AI Python SDK for Gen AI Evaluation Service