Skip to main content

Definition

The session goal achievement test evaluates whether the user’s goal was met by the end of a conversation. An LLM-as-a-judge infers the user’s goal from their messages and scores the full session against four criteria:
  • The user’s intent is correctly identified
  • The goal is fully resolved (not just partially addressed)
  • The session reaches a satisfying conclusion
  • User-side signals indicate satisfaction (e.g., acknowledgment, no repeat asks)

Taxonomy

  • Task types: LLM.
  • Availability: and .
  • Evaluation level: session.
  • Polarity: higher score = better. 0 = goal not achieved at all, 1 = goal fully achieved.

Why it matters

  • Goal achievement is the clearest direct product-quality signal for agentic assistants.
  • Tracking it at the session level captures outcomes that per-turn evaluations miss.

Required columns

  • Input: The user’s message in each turn.
  • Output: The assistant’s response in each turn.
  • Session ID: Groups turns belonging to the same conversation.
  • Timestamp: Used to reconstruct turn order within a session.
This metric relies on an LLM evaluator. On Openlayer you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

[
  {
    "name": "Session goal achievement above 0.7",
    "description": "Ensure sessions meet the user's inferred goal",
    "type": "performance",
    "subtype": "sessionGoalAchievement",
    "thresholds": [
      {
        "insightName": "sessionGoalAchievement",
        "measurement": "meanScore",
        "operator": ">=",
        "value": 0.7
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]