Definition

The data type validation test allows you to set guardrails on the data types of your features.

Taxonomy

  • Category: Integrity.
  • Task types: Tabular classification, tabular regression.
  • Availability: and .

Why it matters

  • Detect data quality issues early by ensuring each feature has the expected data type.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the character length test:

[
  {
    "name": "Feature 'Age' is numeric",
    "description": "Asserts that the feature 'Age' is numeric",
    "type": "integrity",
    "subtype": "dtypeValidation",
    "thresholds": [
      {
        "insightName": "featureProfile",
        "insightParameters": [{ "name": "name", "value": "Age" }], // Selects the feature
        "measurement": "dtype",
        "operator": "is",
        "value": "Numeric" // Checks that it is Numeric
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true, // Apply test to the validation set
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  },
  {
    "name": "Feature 'Geography' is categorical",
    "description": "Asserts that the feature 'Geography' is categorical",
    "type": "integrity",
    "subtype": "dtypeValidation",
    "thresholds": [
      {
        "insightName": "featureProfile",
        "insightParameters": [{ "name": "name", "value": "Geography" }], // Selects the feature
        "measurement": "dtype",
        "operator": "is",
        "value": "Categorical" // Checks that it is Categorical
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true, // Apply test to the validation set
    "usesTrainingDataset": false,
    "usesMlModel": false,
    "syncId": "96622fba-ea00-4e42-8f42-5e8f5f60805f" // Some unique id
  }
]