Skip to main content Skip to complementary content

Configuring the components

Procedure

  1. Double-click tFixedFlowInput to open its Basic settings view.
  2. Next to the Schema field, click the Edit schema button to open the Schema dialog box, add one column and name it FIRSTNAME. When done, click OK to validate these changes and close the dialog box.
  3. In the Mode area, select the Use Inline Content (delimited file) option, and supply the following names in the Content field:
    Kristof
    Chris
    Tony
    Anton
  4. Double-click tSynonymSearch to open its Basic settings view.
  5. Click Sync columns to add the schema columns of its preceding component to the default schema columns of tSynonymSearch.
    When prompted, click Yes to propagate the changes to the next component.
  6. Click the [...] button next to Edit schema to open the Schema dialog box, and add one column to the output schema: matched_fname.
    This column will hold the matched reference entries in the output flow.
    When done, click OK to validate the setting and accept propagating the changes when prompted.
  7. In the Limit of each group field, type in 5 to replace the default value.
  8. Under the Columns to search table, click the [+] button to add one row and define the parameters as follows:
    • In the Input column column, select FIRSTNAME from the list of the input columns.

    • In the Reference output column column, select matched_fname from the list of the output columns.

    • In the Index path column, type in the path to the synonym index to be used, between double quotation marks.

      When using Spark Local mode, use a path to a local folder:
      • Apache Spark 3.1 and earlier: prefix://file path or file:///file path.
      • Apache Spark 3.2 and later: file:///file path.
    • In the Search mode column, select Match all fuzzy. This will match each word of the input string against similar word of the index string.

    • In the Score threshold column, enter 0.9 to filter results and list only terms with higher similarity.

    • In the Max edits column, select1 to be the allowed edit distance to use.

      With max edit distance 1, you can have only one insertion, deletion, or substitution. Any terms within that edit distance from the input data are matched.

    • Leave the Word distance column as it is only for the Match partial mode.

    • In the Limit column, leave the default value 5.

  9. In the Basic settings view of the tLogRow component, select the Table option for better readable display of the Job execution result.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!