Configuring the grouping of the output data
Procedure
-
Click the tMatchGroup
component, and then in its basic settings click the
Edit schema
button to view the input and output columns and do
any modifications in the output schema, if
needed.
In the output schema of this component, there are output standard columns that are read-only. For more information, see tMatchGroup Standard properties.
- Click OK to close the dialog box.
-
Double-click the tMatchGroup component to
display its Configuration Wizard and define the
component properties.
If you want to add a fixed output column, MATCHING_DISTANCES, which gives the details of the distance between each column, click the Advanced settings tab and select the Output distance details check box. For more information, see tMatchGroup Standard properties.
- In the Key definition table, click the plus button to add to the list the columns on which you want to do the matching operation, FirstName and LastName in this scenario.
- Click in the first and second cells of the Matching Function column and select from the list the algorithm(s) to be used for the matching operation, Jaro-Winkler in this example.
- Click in the first and second cells of the Weight column and set the numerical weights for each of the columns used as key attributes.
- In the Match threshold field, enter the match probability threshold. Two data records match when the probability threshold is above this value.
-
Click the plus button below the Blocking Selection table to add a line
in the table, then click in the line and select from
the list the column you want to use as a blocking
value, T_GEN_KEY in this
example.
Using a blocking value reduces the number of pairs of records that needs to be examined. The input data is partitioned into exhaustive blocks based on the functional key. This will decrease the number of pairs to compare, as comparison is restricted to record pairs within each block.
-
Click the Chart
button in the top right corner of the wizard to
execute the Job in the defined configuration and
have the matching results directly in the
wizard.
The matching chart gives a global picture about the duplicates in the analyzed data. The matching table indicates the details of items in each group and colors the groups in accordance with their color in the matching chart.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!