Documentation Index
Fetch the complete documentation index at: https://openlayer.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Definition
The group by column statistic test allows you to measure a statistical property of one column grouped by the unique values of another column, and then set thresholds on how many groups fail to meet your criteria. For each unique value in the grouping column, the test calculates the specified statistic on the target column and checks if it meets your defined condition. The test then counts how many groups fail this condition and compares against your threshold.Taxonomy
- Task types: LLM, tabular classification, tabular regression.
- Availability: and .
Why it matters
- This test helps ensure statistical consistency across different segments or categories in your data.
- It can detect bias, inconsistencies, or quality issues that affect specific subgroups differently.
- It’s essential for fairness validation, ensuring that model inputs have similar statistical properties across different demographics or categories.
- It helps identify data collection issues that might affect certain groups disproportionately.
How it works
The test follows these steps:- Group the data by unique values in the specified grouping column
- Calculate the statistic (mean, median, etc.) on the target column for each group
- Apply the condition to each group’s statistic (e.g., mean >= 25)
- Count failing groups that don’t meet the condition
- Compare the count/percentage of failing groups against your threshold
Available statistics
The following statistical measures are supported for the target column:| Statistic | Description | Example Use Case |
|---|---|---|
sum | Sum of all values in each group | Total sales by region |
mean | Average value for each group | Average age by geography |
median | Median value for each group | Median income by job category |
min | Minimum value in each group | Minimum score by demographic |
max | Maximum value in each group | Maximum transaction by customer type |
count | Number of records in each group | Sample size validation by segment |
variance | Variance of values in each group | Consistency check by category |
std | Standard deviation for each group | Variability assessment by group |
Test configuration examples
If you are writing atests.json, here are a few valid configurations for the group by column statistic test:

