Skip to main content
While individual trace evaluations are useful for assessing single interactions, session-level evaluations allow you to analyze the entire lifecycle of a user’s conversation with your AI agent or chatbot. This is crucial for understanding the overall user experience and identifying issues that only emerge over multiple turns. For example, a chatbot might answer a single question correctly but fail to handle a follow-up question, leading to a poor user experience. Session-level evaluations are crucial for assessing:
  • Coherence: Does the agent maintain a consistent and logical conversation flow?
  • Context Retention: Does the agent remember and correctly utilize information from earlier in the conversation?
  • Goal Achievement: Did the user successfully achieve their overall goal by the end of the session?
  • Task Progression: For multi-step tasks, does the conversation progress logically toward completion?

Session-Level Evaluations via UI

To run evaluations at the session level in the UI, set the evaluator scope to “Session” for each evaluator you want to operate at that level. You will the evaluation output populate next to each session. You can hover over the evaluation to filter by results or view details like score and explanation.
The attributes required for a Session Eval will vary case by case.When you pass attributes into the evaluator, they are concatenated across all spans in the session that contain them. You can use the Evaluator Data Filter bar to precisely control which spans contribute to the concatenated attributes.For example, if you use attributes.input.value in the Eval Template, the input values from all matching spans in the session are concatenated into a single variable and passed to the evaluation prompt.
Session-Level Eval UI