Computing suspect pairs and unique rows
Procedure
Results
tMatchIndexPredict groups together
records from the input data and the matching records from the reference data set
indexed in Elasticsearch and labels the suspect pairs. These appear in the same
row.
tMatchIndexPredict excludes unique records to write them in
another file.
You can now clean and deduplicate the unique rows and use tMatchIndex to add them to the reference data set stored in Elasticsearch.