Validating data using data quality rules and dynamic schema
Information noteAttention: From Talend Studio 8.0 R2025-06, the tDataQualityRules Standard component is
replaced by tDQRules. This component lets you use the rules from:
- Qlik Talend Data Integration.
- Talend Cloud Data Inventory.
- Talend Cloud Data Stewardship.
- The hybrid version of Talend Data Stewardship 8.0 R2022-07 and greater.
For Apache Spark Jobs, you can only use tDataQualityRules with the Talend apps.
The Job in this scenario uses the tDataQualityRules component before Talend Studio 8.0 R2025-06 to validate data using a data quality rule from Talend Cloud Data Stewardship.
The Job uses a data quality rule in advanced mode:if (purchase_price >= 2999) {discount_percentage == 0.2}
elseif (purchase_price >= 1999 && purchase_price < 2999) {discount_percentage == 0.15}
elseif (purchase_price >= 999 && purchase_price < 1999) {discount_percentage == 0.10}
elseif (purchase_price >= 599 && purchase_price < 999) {discount_percentage == 0.07}
else {discount_percentage == 0}
Information noteImportant: You can use a dynamic schema in the
Standard component. Apache Spark components do not support it.
Setting up the Job
Procedure
Configuring the tFileInputDelimited component
Procedure
Configuring the tDataQualityRules component
Before you begin
- You have created at least one data quality rule in Talend Cloud Data Stewardship.
- You have a personal access token. To generate one, see https://help.talend.com/r/en-US/Cloud/management-console-user-guide/cloud-access-token.Information noteTip: Talend Cloud Data Stewardship is used in this example. You can also enter the URL of the hybrid version of Talend Data Stewardship 8.0 R2022-07 and greater, and Talend Cloud Data Inventory.
About this task
Procedure
Configuring the tLogRow components
Procedure
Saving and executing the Job
Procedure
Save your Job and press F6 to execute it.
Results
.--+------+-----------------.
| tLogRow_1 |
|=-+------+----------------=|
|id|price1|dynamic_column |
|=-+------+----------------=|
|3 |1999 |0.15 - 4200 - 0.2|
|4 |5000 |0.2 - 3000 - 0.2 |
|5 |300 |0 - 30 - 0 |
'--+------+-----------------'
.--+------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.
| tLogRow_2 |
|=-+------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------=|
|id|price1|dynamic_column |INVALID_STATUS |
|=-+------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------=|
|1 |100 |0.2 - 2000 - 0.1 |[{"executionResults":[{"ruleName":"Discount percentage","inputColumn":["price1","dynamic_column.discount1"],"status":"Invalid"},{"ruleName":"Discount percentage","inputColumn":["dynamic_column.price2","dynamic_column.discount2"],"status":"Invalid"}]}]|
|2 |1000 |0.05 - 4000 - 0.2|[{"executionResults":[{"ruleName":"Discount percentage","inputColumn":["price1","dynamic_column.discount1"],"status":"Invalid"},{"ruleName":"Discount percentage","inputColumn":["dynamic_column.price2","dynamic_column.discount2"],"status":"Valid"}]}] |
|6 |6000 |0.2 - 3000 - 0.1 |[{"executionResults":[{"ruleName":"Discount percentage","inputColumn":["price1","dynamic_column.discount1"],"status":"Valid"},{"ruleName":"Discount percentage","inputColumn":["dynamic_column.price2","dynamic_column.discount2"],"status":"Invalid"}]}] |
|7 |price | - - |[{"executionResults":[{"ruleName":"Discount percentage","inputColumn":["price1","dynamic_column.discount1"],"status":"NE"},{"ruleName":"Discount percentage","inputColumn":["dynamic_column.price2","dynamic_column.discount2"],"status":"NE"}]}] |
'--+------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------'
The NE status means that the rule cannot be executed on the data. In this scenario, the price value cannot be validated because the rule is set to validate numbers.
The other invalid data mean that the discount applied to price1 and/or price2 is incorrect.
For example, the row #2: The discount 0.05 is incorrect for the price 1000 but the discount 0.2 is correct for the price 4000.
When you save the Job, a JAR file containing the rule definition
is created
in:
studio-path\configuration\.m2\repository\org\talend\libraries\rules-number