Double-click tBlockedFuzzyJoin to display its
Basic settings view
and define its properties.
Click the Edit schema button
to open a dialog box. Here you can define the data you want
to pass to the output components.
In this example we want to pass the four input columns to the output
components in addition to the new column ref_firstname.
Click OK to close the dialog
box and proceed to the next step.
In the Key definition area of
the Basic settings view of
tBlockedFuzzyJoin,
click the plus button to add two columns to the list.
Select the input columns and the output columns you want to do
the fuzzy matching on from the Input
key attribute and Lookup key attribute lists respectively,
grp and
firstname in this
example.
Click in the first cell of the Matching
type column and select from the list the
method to be used to check the incoming data against the
reference data, Exact match in this
example. There is no minimum nor maximum distance to
set.
Set the matching type for the second column,
Levenshtein in this
example.
Then set the minimum and maximum distances. In this method,
the distance is the number of character changes (insertion,
deletion or substitution) that needs to be carried out in
order for the entry to fully match the reference. In this
example, we want the min. distance to be 0 and the max.
distance to be 2. This will output all entries in the
firstname column that exactly
match or that have maximum two character changes.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!