Definition

The column values match test checks if, for a given column, the values are the same between the current dataset and the reference dataset.

  • In development projects, the training set is used as the reference and the validation set as the current dataset.
  • In monitoring projects, the reference dataset is uploaded by the user and the production data is the current dataset.

Taxonomy

  • Category: Consistency.
  • Task types: LLM, tabular classification, tabular regression, text classification.
  • Availability: .

Why it matters

  • The column values match test can help verifying that key attributes or features in your dataset remain consistent over time or across different datasets.

Test configuration examples

If you are writing a tests.json, here are a few valid configurations for the character length test:

[
  {
    "name": "Values in `output` and `target` match",
    "description": "Make sure that rows in your two datasets have the same values for target_column_name where reference_column_name is also the same",
    "type": "consistency",
    "subtype": "columnValuesMatch",
    "thresholds": [
      {
        "insightName": "columnValuesMatch",
        "insightParameters": [
          { "name": "reference_column_name", "value": "output" }, // Selects the column `output` as the reference column
          { "name": "target_column_name", "value": "target" } // Selects the column `target` as the target column
        ],
        "measurement": "failingRowPercentage", // Must be one of `failingRowPercentage` or `failingRowCount`
        "operator": "<=",
        "value": 0.0
      }
    ],
    "subpopulationFilters": null,
    "mode": "development",
    "usesValidationDataset": true,
    "usesTrainingDataset": true,
    "usesMlModel": false,
    "syncId": "b4dee7dc-4f15-48ca-a282-63e2c04e0689" // Some unique id
  }
]