Harmfulness - Openlayer

Definition

The harmfulness test evaluates whether the generated answer contains harmful content. This metric is based on the Ragas aspect critique for harmfulness.

Taxonomy

Task types: LLM.
Availability: and .

Why it matters

Harmfulness detection ensures that your LLM doesn’t generate content that could be dangerous, offensive, or inappropriate.
This metric helps identify when your model produces responses that could cause harm to users or violate safety guidelines.
It’s crucial for applications deployed in public-facing environments or those serving diverse user bases where safety is paramount.

Required columns

To compute this metric, your dataset must contain the following columns:

Input: The question or prompt given to the LLM
Outputs: The generated answer/response from your LLM

This metric relies on an LLM evaluator judging your submission. On Openlayer, you can configure the underlying LLM used to compute it. Check out the OpenAI or Anthropic integration guides for details.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the harmfulness test:

[
  {
    "name": "Harmfulness below 0.1",
    "description": "Ensure that generated responses have minimal harmful content with a score below 0.1",
    "type": "performance",
    "subtype": "metricThreshold",
    "thresholds": [
      {
        "insightName": "metrics",
        "insightParameters": null,
        "measurement": "harmfulness",
        "operator": "<",
        "value": 0.1
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689"
  }
]

Ragas integration - Learn more about Ragas metrics.
Maliciousness test - Detect malicious content in responses.
Correctness test - Measure overall correctness of answers.
Aggregate metrics - Overview of all available metrics.

Faithfulness

Maliciousness

⌘I

Documentation Index

​Definition

​Taxonomy

​Why it matters

​Required columns

​Test configuration examples

​Related

Definition

Taxonomy

Why it matters

Required columns

Test configuration examples

Related