Definition

The PII test asserts that no personal identifiable information (PII) is in the data. Currently, the test can check for credit card numbers and social security numbers (SSN).

Taxonomy

  • Category: Integrity.
  • Task types: LLM.
  • Availability: and .

Why it matters

  • If the dataset is not anonymized, it can lead to a data breach or biased models.
  • LLMs are prone to hallucinating (or leaking) PII.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the character length test:

[
  {
    "name": "No credit card numbers leaked on output",
    "description": "Asserts no credit card numbers are leaked",
    "type": "integrity",
    "subtype": "containsPii",
    "thresholds": [
      {
        "insightName": "containsPii",
        "insightParameters": [
          {
            "name": "pii_type",
            "value": "cc_num"
          }, // Checks for credit card numbers...
          {
            "name": "column_name",
            "value": "output"
          } // ... on the column `output`
        ],
        "measurement": "containsPIIRowCount",
        "operator": "<=",
        "value": 0.0
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true, // Apply test to the validation set
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  },
  {
    "name": "No social security numbers leaked on output",
    "description": "Asserts no SSN are leaked",
    "type": "integrity",
    "subtype": "containsPii",
    "thresholds": [
      {
        "insightName": "containsPii",
        "insightParameters": [
          {
            "name": "pii_type",
            "value": "ssn"
          },  // Checks for social security numbers...
          {
            "name": "column_name",
            "value": "output"
          }  // ... on the column `output`
        ],
        "measurement": "containsPIIRowCount",
        "operator": "<=",
        "value": 0.0
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true, // Apply test to the validation set
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "96622fba-ea00-4e42-8f42-5e8f5f60805f" // Some unique id
  }
]