Pre-Built Evals

The following are simple functions on top of the LLM evals building blocks that are pre-tested with benchmark data.

All evals templates are tested against golden data that are available as part of the LLM eval library's benchmarked data and target precision at 70-90% and F1 at 70-85%.

Heuristic Metrics

Reference Link

User Frustration

Agent Function Calling

Last updated

Was this helpful?