
Google Colab
- Create an agent using the OpenAI agents SDK
- Trace the agent activity
- Create a dataset to benchmark performance
- Run an experiment to evaluate agent performance using LLM as a judge
Initial setup
Install Libraries
Setup Keys
Copy the Arize AXAPI_KEY and SPACE_ID from your Space Settings page (shown below) to the variables in the cell below.

Setup Tracing
Create your first agent with the OpenAI SDK
Here we’ve setup a basic agent that can solve math problems. We have a function tool that can solve math equations, and an agent that can use this tool. We’ll use theRunner class to run the agent and get the final output.
Evaluating our agent
Agents can go awry for a variety of reasons.- Tool call accuracy - did our agent choose the right tool with the right arguments?
- Tool call results - did the tool respond with the right results?
- Agent goal accuracy - did our agent accomplish the stated goal and get to the right outcome?