RAG evaluation measures how well a retrieval-augmented generation system retrieves relevant context and generates grounded answers from it. It usually includes retrieval metrics, context relevance, answer correctness, faithfulness, and citation or grounding checks.
Good RAG evals separate the pipeline into stages. Did the retriever find the right documents? Did the model use them? Did the answer stay faithful to the evidence? A single final-answer score usually does not tell you which part broke.