Working principle
This component implements the MapReduce model, based on the blocking keys defined in the Blocking definition table of the Basic settings view.
This implementation proceeds as follows:
-
Splits the input rows in groups of a given size.
-
Implements a Map Class that creates a map between each key and a list of records.
-
Shuffles the records to group those with the same key together.
-
Applies, on each key, the algorithm defined in the Key definition table of the Basic settings view.
Then accordingly, this component reads the records, compares them with the master records, groups the similar ones, and classes each of the rest as a master record.
-
Outputs the groups of similar records with their group IDs, group sizes, matching distances and scores.