Deduplication scenarios
- Converting the Standard Job to a Spark Batch Job
- Creating a clean data set from the suspect pairs labeled by tMatchPredict and the unique rows computed by tMatchPairing
- Deduplicating entries
- Deduplicating entries based on dynamic schema
- Merging the content of several rows using different columns as rank values
- Modifying the rule file manually to code the conditions you want to use to create a survivor
- Selecting the best-of-breed data from a group of duplicates to create a survivor