Skip to main content

Definition

The session conversation completeness test evaluates whether a multi-turn conversation reached proper resolution with all topics adequately addressed. An LLM-as-a-judge reads the full conversation and scores it against four criteria:
  • The user’s initial request was fully addressed
  • All follow-up questions were answered
  • No topics were left unresolved or only partially addressed
  • The session reached a clear final response rather than trailing off

Taxonomy

  • Task types: LLM.
  • Availability: and .
  • Evaluation level: session.
  • Polarity: higher score = better. 0 = completely incomplete, 1 = fully resolved.

Why it matters

  • Incomplete conversations often signal unresolved issues that will resurface as repeat sessions, support tickets, or churn.
  • Complementary to Session goal achievement. The two overlap — both can penalize a session where the user’s request wasn’t addressed — but completeness also checks that every follow-up and sub-topic received an answer, not only the primary objective.

Required columns

  • Input: The user’s message in each turn.
  • Output: The assistant’s response in each turn.
  • Session ID: Groups turns belonging to the same conversation.
  • Timestamp: Used to reconstruct turn order within a session.
This metric relies on an LLM evaluator. On Openlayer you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

[
  {
    "name": "Session completeness above 0.7",
    "description": "Ensure conversations reach a clean end state",
    "type": "performance",
    "subtype": "sessionConversationCompleteness",
    "thresholds": [
      {
        "insightName": "sessionConversationCompleteness",
        "measurement": "meanScore",
        "operator": ">=",
        "value": 0.7
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]