tHMapFile properties for Apache Spark Batch
These properties are used to configure tHMapFile running in the Spark Batch Job framework.
The Spark Batch tHMapFile component belongs to the Processing family.
This component is available in Talend Platform products with Big Data and in Talend Data Fabric.
Basic settings
Storage |
To connect to an HDFS installation, select the Define a storage configuration component check box and then select the name of the component to use from those available in the drop-down list. This option requires you to have previously configured the connection to the HDFS installation to be used, as described in the documentation for the tHDFSConfiguration component. If you leave the Define a storage configuration component check box unselected, you can only convert files locally. |
Configure Component |
To configure the component, click the [...] button and, in the Component Configuration window, perform the following actions.
|
Input |
Click the [...] button to define the path to where the input file is stored. |
Output |
Click the [...] button to define the path to where the output files will be stored. |
Action |
From the drop-down list, select:
|
Open Map Editor |
Click the [...] button to open the Structure Generate/Select wizard. You can first select the type of map to create:
Information noteNote: This option is available only if you have installed the
R2023-10 Studio monthly update or a later one delivered by Talend. For more information, check with your administrator.
Then you can either have the hierarchical mapper structure generated automatically based on the schema, or select an existing hierarchical mapper structure. You must do this for both the input and output sides of your Map. The following lists the options for the output structure:
If Talend Studio detects multiple output connections available, the window displays both output structure options without the support for multiple output connections check boxes. If neither input nor output connection exists, the Structure Selection page is displayed. |
Die on error |
This check box is selected by default. Clear the check box to skip any rows on error and complete the process for error-free rows. If you opt to clear the check box, you can perform any of these options:
Information noteNote: Any errors while trying to store the reject are logged and the
processing continues.
|
Merge result to single file |
By default, the tHMapFile creates several part files. Select this check box to merge these files into a single file. The following options are used to manage the source and
the target files:
Information noteWarning: Using this option with an Avro output creates an
invalid Avro file. Since each part starts with an Avro Schema header,
the merged file would have more than one Avro Schema, which is
invalid.
|
Advanced settings
Use old Eclipse runtime |
Select this check box to include the old Eclipse runtime in
your Job.
Information noteNote: This option is available only if you have installed the
R2024-03 Talend Studio monthly update or a later one delivered by Talend. For more information, check with your administrator.
|
Usage
Usage rule | This component is used with a tHDFSConfiguration component which defines the connection to the HDFS storage, or as a standalone component for mapping local files only. |
Usage with Talend Runtime | If you want to deploy a Job or Route containing a data mapping component with Talend Runtime, you first need to install the Talend Data Mapper feature. For more information, see Using Talend Data Mapper with Talend Runtime. |