Configuring the components
Procedure
-
Double-click tFixedFlowInput to open its
Basic settings view.
-
Next to the Schema field, click the
Edit schema button to open the
Schema dialog box, add one column and
name it FIRSTNAME. When done, click OK to validate these changes and close the dialog
box.
-
In the Mode area, select the Use Inline Content (delimited file) option, and
supply the following names in the Content
field:
Kristof Chris Tony Anton
-
Double-click tSynonymSearch to open its
Basic settings view.
-
Click Sync columns to add the schema
columns of its preceding component to the default schema columns of
tSynonymSearch.
When prompted, click Yes to propagate the changes to the next component.
-
Click the [...] button next to Edit schema to open the Schema dialog box, and add one column to the output
schema: matched_fname.
This column will hold the matched reference entries in the output flow.When done, click OK to validate the setting and accept propagating the changes when prompted.
- In the Limit of each group field, type in 5 to replace the default value.
-
Under the Columns to search table, click
the [+] button to add one row and define
the parameters as follows:
-
In the Input column column, select FIRSTNAME from the list of the input columns.
-
In the Reference output column column, select matched_fname from the list of the output columns.
-
In the Index path column, type in the path to the synonym index to be used, between double quotation marks.
When using Spark Local mode, use a path to a local folder:- Apache Spark 3.1 and earlier: prefix://file path or file:///file path.
- Apache Spark 3.2 and later: file:///file path.
-
In the Search mode column, select Match all fuzzy. This will match each word of the input string against similar word of the index string.
-
In the Score threshold column, enter 0.9 to filter results and list only terms with higher similarity.
-
In the Max edits column, select1 to be the allowed edit distance to use.
With max edit distance 1, you can have only one insertion, deletion, or substitution. Any terms within that edit distance from the input data are matched.
-
Leave the Word distance column as it is only for the Match partial mode.
-
In the Limit column, leave the default value 5.
-
- In the Basic settings view of the tLogRow component, select the Table option for better readable display of the Job execution result.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!