Designing the flow of the data to write and encrypt onto EMR
Link the components to construct the data flow.
Procedure
In the Integration
perspective of Talend Studio,
create an empty Spark Batch Job from the Job Designs node in
the Repository tree view.
In the workspace, enter the name of the component to be used and select this
component from the list that appears. In this scenario, the components are
tHDFSConfiguration (labeled emr_hdfs), tS3Configuration, tFixedFlowInput, tAggregateRow and tFileOutputParquet.
The tFixedFlowInput component is used to load
the sample data into the data flow. In the real-world practice, use the input component specific to the data format or the source system to be used instead of tFixedFlowInput.
Connect tFixedFlowInput, tAggregateRow and
tFileOutputParquet using the Row > Main link.
Leave the tHDFSConfiguration component and the tS3Configuration component alone
without any connection.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!