Definition
The session turn relevancy test evaluates whether each turn is relevant to the ongoing conversation. An LLM-as-a-judge reads the full session and scores it against four criteria:- Each response directly addresses the user’s question or request
- Responses avoid irrelevant information or padding
- Off-topic tangents are handled appropriately (redirected rather than indulged)
- The conversation stays focused on the thread the user is pursuing
Taxonomy
- Task types: LLM.
- Availability: and .
- Evaluation level: session.
- Polarity: higher score = better.
0= completely irrelevant turns,1= all turns highly relevant.
Why it matters
- Even assistants that answer factually correctly sometimes respond to adjacent but not directly-requested questions — a subtle quality degradation.
- Turn-level relevancy aggregated across a session catches patterns where the assistant consistently half-misunderstands the user.
Required columns
- Input: The user’s message in each turn.
- Output: The assistant’s response in each turn.
- Session ID: Groups turns belonging to the same conversation.
- Timestamp: Used to reconstruct turn order within a session.
Test configuration examples
Related
- Answer relevancy — trace-level relevancy metric (Ragas).

