Definition
The session context retention test evaluates whether the assistant maintains and correctly uses context across the turns of a conversation. An LLM-as-a-judge reads the full session and scores it against four criteria:- Remembers facts and preferences established in prior turns
- Builds upon previously established context rather than starting fresh each turn
- Avoids asking for information the user has already provided
- Doesn’t contradict information given earlier in the session
Taxonomy
- Task types: LLM.
- Availability: and .
- Evaluation level: session.
- Polarity: higher score = better.
0= no context retention,1= perfect context retention.
Why it matters
- Context-retention failures are a primary driver of user frustration in multi-turn assistants — especially re-asking for information already supplied.
Required columns
- Input: The user’s message in each turn.
- Output: The assistant’s response in each turn.
- Session ID: Groups turns belonging to the same conversation.
- Timestamp: Used to reconstruct turn order within a session.
Test configuration examples
Related
- Session coherence — broader consistency signal.

