Definition

The null rows test allows you to specify the number (or percentage) of rows with missing values.

Taxonomy

  • Category: Integrity.
  • Task types: LLM, tabular classification, tabular regression, text classification.
  • Availability: and .

Why it matters

  • Missing values can have a direct impact on model performance.
  • The values missing from certain features can indicate issues with the data collection/ingestion process.
  • Measuring and tracking the number of missing values can inform the imputation strategies to be used.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the character length test:

[
  {
    "name": "No rows with null values",
    "description": "Asserts that there are no rows with missing values",
    "type": "integrity",
    "subtype": "nullRowCount",
    "thresholds": [
      {
        "insightName": "nullRowCount",
        "insightParameters": null,
        "measurement": "nullRowPercentage",
        "operator": "<=",
        "value": 0.0
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true, // Apply test to the validation set
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  }
]