Validating and correcting your data with data stewardship
With Data stewardship in Qlik Talend Cloud, you can draw on subject matter experts to validate and correct your data. Use your existing semantic types and validation rules to make sure that the data is consistently formed. This extends automated pipelines with human-in-the-loop remediation from domain expertise. When the data is validated you can re-inject it into the original data source, or to any downstream system.
You create a sprint which is the main body of work for the validation and remediation. The sprint contains information about:
-
The source data
-
The data schema to use for validation
-
The owners of the sprint
-
The data stewards that are defined
-
The data storage used for sprint data
-
Workflow settings
During the sprint, all sprint data is stored in your own cloud data warehouse, and not in Qlik Talend Cloud. Currently, Snowflake is the only supported cloud data warehouse.
You can define the following user roles:
-
Sprint owner
Sprint owners can validate records that are resolved by data stewards. They can also access records that are resolved and export data.
-
Data steward
A data steward is assigned records to resolve quality issues.
You create sprints in Data stewardship in the Qlik Talend Data Integration activity center. You can create Resolution sprints that correct and curate data in one or more fields in the dataset that requires validation. This is the workflow:
-
Create a sprint and define the data to validate. You can either populate the sprint with a Talend Studio Job, or import a CSV file with data.
Data stewards are defined to perform the validation. Records can be assigned either manually or automatically.
-
Working in a resolution sprint
Data stewards validate the data in the assigned records.
-
-
If you populated the sprint with a Talend Studio Job, you create a Talend Studio Job to retrieve the validated records and return them to the original data source, or to any other required destination.
-
If you populated the sprint with a CSV file, the sprint is concluded by exporting the validated data to a CSV file. You can update the data source with validated data by importing the exported CSV file.
-