Skip to main content

Span-Level Evaluation

EvaluatorTutorial
Evaluate code functionalityColab Link
Evaluate hallucinationColab Link
Evaluate human ground truth vs. AIColab Link
Evaluate Q&A correctnessColab Link
Evaluate RAGColab Link
Evaluate reference linksColab Link
Evaluate relevanceColab Link
Evaluate SQL correctnessColab Link
Evaluate tool callingColab Link
Evaluate toxicityColab Link
Evaluate user frustrationColab Link