Skip to main content

Definition

The session role adherence test evaluates whether the assistant stays in its defined role across a multi-turn conversation. You supply a role description in plain English (for example, “a customer-support agent for an e-commerce platform who handles order tracking, returns, and product questions”), and an LLM-as-a-judge scores how well the assistant kept to that role throughout the session.

Taxonomy

  • Task types: LLM.
  • Availability: and .
  • Evaluation level: session.
  • Polarity: higher score = better (role adhered to).

Why it matters

  • Role drift is a leading indicator of prompt leakage, jailbreak success, or retrieval-augmented context bleeding into the agent’s persona.
  • Role adherence at the session level catches drift that builds across turns and is invisible to per-turn checks.

Required columns

  • Input: The user’s message in each turn.
  • Output: The assistant’s response in each turn.
  • Session ID: Groups turns belonging to the same conversation.
  • Timestamp: Used to reconstruct turn order within a session.

Insight parameters

  • role_definition (string, optional): A plain-English description of the assistant’s expected role. If omitted, the judge falls back to a generic “appropriate-for-context” role check (useful when you haven’t yet formalized the role, but lower-signal than the explicit version).
The judge evaluates four dimensions when a role is supplied:
  • Persona / expertise — does the assistant present itself as the specified role?
  • Scope — does it stay within the role’s topical boundaries?
  • Tone & style — does it match the communication style expected of the role?
  • Handling out-of-role requests — does it decline or redirect cleanly?
This metric relies on an LLM evaluator. On Openlayer you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

[
  {
    "name": "Session role adherence above 0.7",
    "description": "Ensure the assistant stays in its defined customer-support role",
    "type": "performance",
    "subtype": "sessionRoleAdherence",
    "thresholds": [
      {
        "insightName": "sessionRoleAdherence",
        "insightParameters": [
          {
            "name": "role_definition",
            "value": "You are a customer-support agent for an e-commerce platform. You help with order tracking, returns, and product questions."
          }
        ],
        "measurement": "meanScore",
        "operator": ">=",
        "value": 0.7
      }
    ],
    "subpopulationFilters": null,
    "mode": "monitoring",
    "usesProductionData": true,
    "evaluationWindow": 3600,
    "delayWindow": 0
  }
]