Validating data using data quality rules and dynamic schema
The Job in this scenario uses the tDataQualityRules component to validate data using a data quality rule from Talend Cloud Data Stewardship.
The Job uses a data quality rule in advanced mode:if (purchase_price >= 2999) {discount_percentage == 0.2}
elseif (purchase_price >= 1999 && purchase_price < 2999) {discount_percentage == 0.15}
elseif (purchase_price >= 999 && purchase_price < 1999) {discount_percentage == 0.10}
elseif (purchase_price >= 599 && purchase_price < 999) {discount_percentage == 0.07}
else {discount_percentage == 0}
Information noteImportant: You can use a dynamic schema in the Standard
component. Apache Spark components do not support it.
Setting up the Job
Procedure
Configuring the tFileInputDelimited component
Procedure
Configuring the tDataQualityRules component
Before you begin
- You have created at least one data quality rule in Talend Cloud Data Stewardship.
- You have a personal access token. To generate one, see https://help.talend.com/r/en-US/Cloud/management-console-user-guide/cloud-access-token.Information noteTip: Talend Cloud Data Stewardship is used in this example. You can also enter the URL of the hybrid version of Talend Data Stewardship 8.0 R2022-07 and greater, and Talend Cloud Data Inventory.
About this task
Procedure
Configuring the tLogRow components
Procedure
Saving and executing the Job
Procedure
Save your Job and press F6 to execute it.
Results
.--+------+-----------------.
| tLogRow_1 |
|=-+------+----------------=|
|id|price1|dynamic_column |
|=-+------+----------------=|
|3 |1999 |0.15 - 4200 - 0.2|
|4 |5000 |0.2 - 3000 - 0.2 |
|5 |300 |0 - 30 - 0 |
'--+------+-----------------'
.--+------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.
| tLogRow_2 |
|=-+------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------=|
|id|price1|dynamic_column |INVALID_STATUS |
|=-+------+-----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------=|
|1 |100 |0.2 - 2000 - 0.1 |[{"executionResults":[{"ruleName":"Discount percentage","inputColumn":["price1","dynamic_column.discount1"],"status":"Invalid"},{"ruleName":"Discount percentage","inputColumn":["dynamic_column.price2","dynamic_column.discount2"],"status":"Invalid"}]}]|
|2 |1000 |0.05 - 4000 - 0.2|[{"executionResults":[{"ruleName":"Discount percentage","inputColumn":["price1","dynamic_column.discount1"],"status":"Invalid"},{"ruleName":"Discount percentage","inputColumn":["dynamic_column.price2","dynamic_column.discount2"],"status":"Valid"}]}] |
|6 |6000 |0.2 - 3000 - 0.1 |[{"executionResults":[{"ruleName":"Discount percentage","inputColumn":["price1","dynamic_column.discount1"],"status":"Valid"},{"ruleName":"Discount percentage","inputColumn":["dynamic_column.price2","dynamic_column.discount2"],"status":"Invalid"}]}] |
|7 |price | - - |[{"executionResults":[{"ruleName":"Discount percentage","inputColumn":["price1","dynamic_column.discount1"],"status":"NE"},{"ruleName":"Discount percentage","inputColumn":["dynamic_column.price2","dynamic_column.discount2"],"status":"NE"}]}] |
'--+------+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------'
The NE status means that the rule cannot be executed on the data. In this scenario, the price value cannot be validated because the rule is set to validate numbers.
The other invalid data mean that the discount applied to price1 and/or price2 is incorrect.
For example, the row #2: The discount 0.05 is incorrect for the price 1000 but the discount 0.2 is correct for the price 4000.
When you save the Job, a JAR file containing the rule definition
is created
in:
studio-path\configuration\.m2\repository\org\talend\libraries\rules-number