tManagePartitions properties for Apache Spark Batch
These properties are used to configure tManagePartitions running in the Spark Batch Job framework.
The Spark Batch tManagePartitions component belongs to the Processing family.
The component in this framework is available in all Talend products with Big Data and in Talend Data Fabric.
Basic settings
Number of partitions | Enter the number of partitions you want to split the input dataset up into. |
Partitioning strategy | Select the partitioning strategy you want to apply to the dataset from the
drop-down list:
|
Partitioning with range | Select this check box to apply a partition on the dataset based on a range of the column values. You need to specify at least a column to use this parameter. |
Use custom partitioner | Select this check box to use a Spark partitioner you need to import from
outside Talend Studio.
For example, a partitioner you have developed by yourself. In this situation, you
need to give the following information:
This parameter is only available when you select Repartition from the Partitioning strategy drop-down list. |
Use column(s) as key(s) for partitioning | Select the column you want to use as key for partitioning. This parameter is only available when you select Repartition from the Partitioning strategy drop-down list. This parameter is not available when you select the Use a custom partitioner check box. |
Sort within partitions | Select this check box to sort the records in each partition:
This feature is useful when a partition contains several distinct key values. |
Global Variables
ERROR_MESSAGE |
The error message generated by the component when an error occurs. This is an After variable and it returns a string. |
Usage
Usage rule |
This component is used as an intermediate step. This component, along with the Spark Batch component Palette it belongs to, appears only when you are creating a Spark Batch Job. Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs. |