Setting up the Job
Procedure
- Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tMatchPredict and tFileOutputDelimited.
- Connect tFileInputDelimited to tMatchPredict using the Main link.
- Connect tMatchPredict to tFileOutputDelimited using the Suspect duplicates link.
- Check that you have defined the connection to the Spark cluster and activated checkpointing in the Run > Spark Configuration view as described in Computing suspect pairs and suspect sample from source data.
- Check that you have defined the connection to the Spark cluster and activated checkpointing in the Talend Help Center (https://help.talend.com). view. For more information about selecting the Spark mode, see the documentation on