What is a data quality rule?
A data quality rule is a set of business requirements which helps you detect anomalies in datasets.
It defines the values your data must comply with. A condition can be added to make the data quality rule apply to some data only.
- You create the data quality rule as a standalone object. When you are defining the
rule, you can use variables and specific values.
As data quality rules are generic, the variables let you adapt the rule to each dataset by associating variables to the fields of the dataset.
Specific values let you use the same value in all datasets to which you applied the rule.
- You apply the data quality rule and adapt it to a field.
You associate the variables of the data quality rule with the fields. You can apply a rule to a field to validate data from other fields.
- The data quality rule validates your data by categorizing the values:
- The values are valid. They fulfill all rule statements.
- The values are not applicable. They do not fulfill the condition and no alternative validation expression has been defined.
- The values are invalid. They fulfill the condition but not the validation expression or the rule cannot be executed on those values. For example, if the rule must compare a string with a number.
The data quality rules have effects on the quality of your dataset and the Talend Trust Score™.