Surviving master records
Merging records using tRuleSurvivorship
Once you estimated duplicates and possible duplicates that are grouped together, you can use the tRuleSurvivorship component to create a single representation for each group of duplicates using the best-of-breed data. This representation is called a survivor.
For an example of how to create a clean data set from the suspect pairs labeled by tMatchPredict and the unique rows computed by tMatchPairing, see https://help.talend.com/en-US/data-matching/8.0/tmatchmodel.
Using Talend Data Stewardship for clerical review and merging records
You can add merging campaigns in Talend Data Stewardship to review and modify survivorship rules, create master records and merge data.
For further information on merging campaigns in Talend Data Stewardship, see Talend Data Stewardship Examples.
In Talend Data Stewardship, data stewards are business users in charge of resolving data stewardship tasks:- Classifying data by assigning a label chosen among a predefined list of arbitration choices.
- Merging several potential duplicate records into one single
record.
Merging tasks allow authorized data stewards to merge several potential duplicate source records into one single record (golden record). The outcome of a merging task is the golden record produced by data stewards.
For further information on merging tasks in Talend Data Stewardship, see Talend Data Stewardship Examples.
Source records can come from the same source (database deduplication) or different sources (databases reconciliation).