Linking the components to design the flow of Delta Lake data
Drop and link the components to be used to read and process your Delta Lake data.
Procedure
- In the Integration perspective of Talend Studio, create an empty Spark Batch Job from the Job Designs node in the Repository tree view.
- In the workspace, enter the name of the component to be used and select this component from the list that appears. In this scenario, the components are tS3Configuration (labeled s3_flights), two tDeltaLakeInput components (labeled flights_latest_version and flights_first_version, respectively), two tAggregateRow components (labeled count_per_flights), two tPartition components (labeled repart), one tMap and one tFileOutputDelimited.
- Connect these components using the link as the image above presented.
- Leave the tS3Configuration component alone without any connection.