Blocking by partitions
Record linkage is a demanding task because each record must be compared to the
other ones from the data set. To improve the efficiency of this task, the blocking
technique is a required step most of the time.
Blocking consists of sorting data into similar sized partitions which have the same attribute. The objective is to restrict comparisons to the records grouped within the same partition.
To create efficient partitions, you need to find attributes which are unlikely to change, such as a person's first name or last name. By doing this, you improve the reliability of the blocking step and the computation speed of the task.
It is recommended to use the tGenKey component to generate blocking keys and to view the distribution of the blocks.
For more information on generating blocking keys, see Identification.