Session-Level Evals

Evaluate entire user conversations to measure coherence, context retention, and overall goal achievement in your LLM applications.

While individual trace evaluations are useful for assessing single interactions, session-level evaluations allow you to analyze the entire lifecycle of a user's conversation with your AI agent or chatbot.

This is crucial for understanding the overall user experience and identifying issues that only emerge over multiple turns. For example, a chatbot might answer a single question correctly but fail to handle a follow-up question, leading to a poor user experience.

Session-level evaluations are crucial for assessing:

  • Coherence: Does the agent maintain a consistent and logical conversation flow?

  • Context Retention: Does the agent remember and correctly utilize information from earlier in the conversation?

  • Goal Achievement: Did the user successfully achieve their overall goal by the end of the session?

  • Task Progression: For multi-step tasks, does the conversation progress logically toward completion?

To run evaluations at the trace level in the UI, set the evaluator scope to “Session” for each evaluator you want to operate at that level. You will the evaluation output populate next to each session. You can hover over the evaluation to filter by results or view details like score and explanation.

Last updated

Was this helpful?