Finding duplicate files between two folders
This scenario describes a Job that iterates on files in two folders, transforms the iteration results to data flows to obtain a list of filenames, and then picks up all duplicates from the list and shows them on the Run console, as a preparation step before merging the two folders, for example.
Dropping and linking the components
Procedure
- From the Palette, drop two tFileList components, two tIterateToFlow components, two tFileOutputDelimited components, a tFileInputDelimited component, a tUniqRow component, and a tLogRow component onto the design workspace.
- Link the first tFileList component to the first tIterateToFlow component using a Row > Iterate connection, and the connect the first tIterateToFlow component to the first tFileOutputDelimited component using a Row > Main connection to form the first subJob.
- Link the second tFileList component to the second tIterateToFlow component using a Row > Iterate connection, and the connect the second tIterateToFlow component to the second tFileOutputDelimited component using a Row > Main connection to form the second subJob.
- Link the tFileInputDelimited to the tUniqRow component using a Row > Main connection, and the tUniqRow component to the tLogRow component using a Row > Duplicates connection to form the third subJob.
- Link the three subJobs using Trigger > On Subjob Ok connections so that they will be triggered one after another, and label the components to better identify their roles in the Job.