Span-Level Evaluation
| Evaluator | Tutorial |
|---|---|
| Evaluate code functionality | Colab Link |
| Evaluate hallucination | Colab Link |
| Evaluate human ground truth vs. AI | Colab Link |
| Evaluate Q&A correctness | Colab Link |
| Evaluate RAG | Colab Link |
| Evaluate reference links | Colab Link |
| Evaluate relevance | Colab Link |
| Evaluate SQL correctness | Colab Link |
| Evaluate tool calling | Colab Link |
| Evaluate toxicity | Colab Link |
| Evaluate user frustration | Colab Link |