Configuring the components
Procedure
-
Double-click tFixedFlowInput to open its
Basic settings view.
-
Next to Edit schema, click the [...] button to open the Schema dialog box, and add a second column
LASTNAME next to the FIRSTNAME
column you have defined in the previous scenario.
When done, click OK to validate this change and thus close the dialog box.
- In the Content field of the Mode area, add more first name and last name data to make the input data read as follows:Kristof;Toum Chris;Toom Tony;Walker Anton;Correia Jim;Correia Jim;Walker
-
Double-click tSynonymSearch to open its
Basic settings view.
- Click Sync columns to synchronize the columns of this component with the preceding one and click Yes to propagate the changes to the next component when prompted.
-
Click the [...] button next to Edit schema to open the Schema dialog box, and add two columns to the output
schema: matched_fname and
matched_lname.
These columns will hold the matched reference entries in the output flow.When done, click OK to validate the setting and accept propagating the changes when prompted.
- In the Limit of each group field, type in 10 to replace the one you have defined in the previous scenario.
-
Under the Columns to search table, click
the [+] button to add a second row and
define the parameters as follows:
-
In the Input column column, select LASTNAME from the drop-down list.
-
In the Reference output column column, select matched_lname from the drop-down list.
-
In the Index path column, type in, between quotation marks, the path to the synonym index holding the last name entries.
When using Spark Local mode, use a path to a local folder:- Apache Spark 3.1 and earlier: prefix://file path or file:///file path.
- Apache Spark 3.2 and later: file:///file path.
-
In the Search mode column, select Match exact for both input columns. This will match the exact input word against an exact index word.
-
In the Score threshold column, enter 0.9 to filter results and list only terms with higher similarity.
-
Leave the Min similarity and Word distance columns as they are only for the fuzzy modes and the Match partial mode respectively.
-
In the Limit column of this row, leave the default value 5.
-
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!