Definition
The session token count test monitors the total tokens used per session. It aggregates the per-trace token count across all turns in a session. No LLM evaluator is involved — it’s a deterministic aggregation.Taxonomy
- Task types: LLM.
- Availability: and .
- Evaluation level: session.
- Computation: deterministic aggregation.
Why it matters
- Session-level token usage is the cleanest proxy for “how much work did the LLM do for this user”.
- Outliers often reveal conversations that are hitting context-window limits, pulling in too much RAG context per turn, or looping.
Available measurements
| Measurement | What it means |
|---|---|
totalTokens | Sum of tokens across all traces in the window (global, not per-session) |
meanTokensPerSession | Mean of per-session token sums across sessions in the window |
medianTokensPerSession | Median of per-session token sums across sessions in the window |
Required columns
- Session ID: Groups turns belonging to the same conversation.
- Token count: Per-trace token count (usually populated automatically by the Openlayer client or via OpenTelemetry).
Test configuration examples
Related
- Session cost — cost view of the same usage.
- Mean tokens, Max tokens, Total tokens — trace-level token metrics.

