Skip to main content

Definition

The session token count test monitors the total tokens used per session. It aggregates the per-trace token count across all turns in a session. No LLM evaluator is involved — it’s a deterministic aggregation.

Taxonomy

  • Task types: LLM.
  • Availability: and .
  • Evaluation level: session.
  • Computation: deterministic aggregation.

Why it matters

  • Session-level token usage is the cleanest proxy for “how much work did the LLM do for this user”.
  • Outliers often reveal conversations that are hitting context-window limits, pulling in too much RAG context per turn, or looping.

Available measurements

MeasurementWhat it means
totalTokensSum of tokens across all traces in the window (global, not per-session)
meanTokensPerSessionMean of per-session token sums across sessions in the window
medianTokensPerSessionMedian of per-session token sums across sessions in the window

Required columns

  • Session ID: Groups turns belonging to the same conversation.
  • Token count: Per-trace token count (usually populated automatically by the Openlayer client or via OpenTelemetry).

Test configuration examples

[
  {
    "name": "Mean session token count below 10k",
    "description": "Alert when average session token usage exceeds 10,000 tokens",
    "type": "performance",
    "subtype": "sessionTokenCount",
    "thresholds": [
      {
        "insightName": "sessionTokenCount",
        "measurement": "meanTokensPerSession",
        "operator": "<=",
        "value": 10000
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]