Indexing clean and deduplicated data in Elasticsearch
Before you begin
-
The Elasticsearch cluster and Elasticsearch-head are started before executing the Job.
For more information about Elasticsearch-head, which is a plugin for browsing an Elasticsearch cluster, see https://mobz.github.io/elasticsearch-head/.
Procedure
Results
tMatchIndex created the education-agencies-chicago index in Elasticsearch, populated it with the clean data and computed the best suffixes based on the blocking key values.
You can browse the index created by tMatchIndex using the plugin Elasticsearch-head.
You can now use the indexed data as a reference data set for the tMatchIndexPredict component.
For an example of how to do continuous matching, see Doing continuous matching using tMatchIndexPredict.