Function Calling Agent: Evaluation

This demo covers how to run custom LLM evaluations in Phoenix using an LLM judge approach for a function calling agent. It explains the process of setting up data frames, different evaluation methods like LLM generate and classify, and exporting results back to the Phoenix UI.

Follow along in the notebook and learn more about how to build a custom evaluator.

