tHConvertFile properties for Apache Spark Batch
These properties are used to configure tHConvertFile running in the Spark Batch Job framework.
The Spark Batch tHConvertFile component belongs to the Processing family.
This component is available in Talend Platform products with Big Data and in Talend Data Fabric.
Basic settings
Storage |
To connect to an HDFS installation, select the Define a storage configuration component check box and then select the name of the component to use from those available in the drop-down list. This option requires you to have previously configured the connection to the HDFS installation to be used, as described in the documentation for the tHDFSConfiguration component. If you leave the Define a storage configuration component check box unselected, you can only convert files locally. |
Configure Component |
To configure the component, click the [...] button and, in the Component Configuration window, perform the following actions.
|
Input |
Click the [...] button to define the path to where the input file is stored. You can also enter the path manually, between quotes. |
Output |
Click the [...] button to define the path to where the output file will be stored. You can also enter the path manually, between quotes. |
Action |
From the drop-down list, select:
|
Open Structure Editor |
Click the [...] button to open the structure for editing in the Structure Editor of Talend Data Mapper . For more information, see Talend Data Mapper User Guide. |
Merge result to single file |
By default, the tHConvertFile creates several part files. Select this check box to merge these files into a single file. Information noteNote: This option is available only if you have installed the
R2020-07 Studio Monthly update or a later one delivered by Talend. For more information, check with
your administrator.
The following options are used to manage the source and
the target files:
Information noteWarning: Using this option with an Avro output creates an
invalid Avro file. Since each part starts with an Avro Schema header,
the merged file would have more than one Avro Schema, which is
invalid.
|
Advanced settings
Die on error |
Select the check box to stop the execution of the Job when an error occurs. Clear the check box to skip any error and continue the Job execution process. |
Usage
Usage rule | This component is used with a tHDFSConfiguration component which defines the connection to the HDFS storage, or as a standalone component for converting local files only. |
Usage with Talend Runtime | If you want to deploy a Job or Route containing a data mapping component with Talend Runtime, you first need to install the Talend Data Mapper feature. For more information, see Using Talend Data Mapper with Talend Runtime. |