Skip to main content
Ragas is a library that provides robust evaluation metrics for LLM applications, making it easy to assess quality. NVIDIA has developed three specialized metrics through sophisticated LLM-as-a-judge evaluation approaches:
  1. Answer Accuracy
  2. Context Relevance
  3. Response Groundedness
This guide will walk you through the process of creating and evaluating agents using Ragas and Arize. This notebook demonstrates how to:
  • Build a RAG pipeline using LlamaIndex
  • Create a test dataset for evaluation
  • Run 3 experiments with varying parameters
  • Evaluate using NVIDIA metrics (AnswerAccuracy, ContextRelevance, ResponseGroundedness)
  • View comprehensive analysis and compare results in the Arize platform
  • Analyze how retrieval count and chunk size impact evaluation metrics
We will walk through the key steps in the documentation below. Check out the full tutorial here:
https://storage.googleapis.com/arize-phoenix-assets/assets/images/phoenix-docs-images/gc.ico

Colab Notebook Tutorial

How NVIDIA metrics are Calculated:

The following approach applies to AnswerAccuracy, ContextRelevance, and ResponseGroundedness metrics Step 1: The LLM generates ratings using two distinct templates to ensure robustness:
  • Template 1: The LLM compares the response with the reference and rates it on a scale of 0, 2, or 4.
  • Template 2: The LLM evaluates the same question again, but this time the roles of the response and the reference are swapped. This dual-perspective approach guarantees a fair assessment of the answer’s accuracy.
Step 2: If both ratings are valid, the final score is average of score1 and score2; otherwise, it takes the valid one.

Example Calculation:

  • User Input: “When was Einstein born?”
  • Response: “Albert Einstein was born in 1879.” Reference: “Albert Einstein was born in 1879.” Assuming both templates return a rating of 4 (indicating an exact match), the conversion is as follows:
  • A rating of 4 corresponds to 1 on the [0,1] scale. Averaging the two scores: (1 + 1) / 2 = 1. Thus, the final Answer Accuracy score is 1.

Resources