This demo covers how to run custom LLM evaluations in Phoenix using an LLM judge approach for a function calling agent. It explains the process of setting up data frames, different evaluation methods like LLM generate and classify, and exporting results back to the Phoenix UI.
📓Notebook:
https://colab.research.google.com/gist/PubliusAu/be1fd140aa4de1491bfa6ca5859464ca/bring-your-own-evaluator-phoenix-example.ipynb#scrollTo=is3clylxi_XI
🔗 Other Handy Links
Arize Phoenix: https://phoenix.arize.com/
How to bring your own evaluator: https://docs.arize.com/phoenix/evaluation/how-to-evals/bring-your-own-evaluator
Follow John Gilhuly: https://www.linkedin.com/in/john-gilhuly-25a15888/
Join community to ask questions: https://join.slack.com/t/arize-ai/shared_invite/zt-26zg4u3lw-OjUNoLvKQ2Yv53EfvxW6Kg
⭐️ Star Phoenix on GitHub: https://github.com/Arize-ai/phoenix