Semantic similarity

Definition

The semantic similarity test assesses the similarity in meaning between sentences, by measuring their closeness in semantic space using advanced natural language processing techniques.

Taxonomy

Task types: LLM.
Availability: and .

Why it matters

Semantic similarity captures the meaning-based relationship between generated and reference text, going beyond surface-level string matching.
This metric is particularly valuable when different phrasings can convey the same meaning, making it ideal for tasks like paraphrasing, summarization, or question answering.
It provides a more nuanced evaluation than exact matching by considering the conceptual similarity rather than just textual similarity.

Required columns

To compute this metric, your dataset must contain the following columns:

Outputs: The generated text from your LLM
Ground truths: The reference/expected text to compare against

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the semantic similarity test:

[
  {
    "name": "Mean semantic similarity above 0.8",
    "description": "Ensure that the mean semantic similarity score is above 0.8",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "meanSemanticSimilarity",
        "operator": ">",
        "value": 0.8
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]

BLEU score test - Measure n-gram based text similarity.
Quasi-exact match test - Allow partial matches and variations.
Answer relevancy test - Measure relevance of answers to questions.
Aggregate metrics - Overview of all available metrics.

Quasi-exact match

Toxicity

⌘I

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

Semantic similarity

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Get started

Workspace setup

Governance

Observability

Data quality monitoring

Offline testing

Tests

Administration

Other resources

Documentation Index

​Definition

​Taxonomy

​Why it matters

​Required columns

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Related