Skip to main content Skip to complementary content

tManagePartitions properties for Apache Spark Batch

These properties are used to configure tManagePartitions running in the Spark Batch Job framework.

The Spark Batch tManagePartitions component belongs to the Processing family.

The component in this framework is available in all Talend products with Big Data and in Talend Data Fabric.

Basic settings

Number of partitions Enter the number of partitions you want to split the input dataset up into.
Partitioning strategy Select the partitioning strategy you want to apply to the dataset from the drop-down list:
  • Coalesce: reduces the number of partitions.
  • Repartition: increases or decrease the number of partitions.
  • Auto: calculates the better strategy to apply.
Partitioning with range Select this check box to apply a partition on the dataset based on a range of the column values. You need to specify at least a column to use this parameter.
Use custom partitioner Select this check box to use a Spark partitioner you need to import from outside Talend Studio. For example, a partitioner you have developed by yourself. In this situation, you need to give the following information:
  • Fully qualified class name: enter the fully qualified class name of the partitioner to be imported.

  • JAR name: click the [+] button as many time as needed to add the same number of rows. In each row, click the [...] button to import the jar file containing this partitioner class and its dependent jar files.

This parameter is only available when you select Repartition from the Partitioning strategy drop-down list.

Use column(s) as key(s) for partitioning Select the column you want to use as key for partitioning.

This parameter is only available when you select Repartition from the Partitioning strategy drop-down list. This parameter is not available when you select the Use a custom partitioner check box.

Sort within partitions Select this check box to sort the records in each partition:
  • Natural order: keys are sorted in their natural order, for example, in the alphabetical order.

  • Custom comparator: this allows you to use a custom program to sort the keys.

    You need to enter the fully qualified class name of the comparator to be imported in the Fully qualified class name field and add the JAR files to be loaded in the JAR name table.

This feature is useful when a partition contains several distinct key values.

Global Variables

ERROR_MESSAGE

The error message generated by the component when an error occurs. This is an After variable and it returns a string.

Usage

Usage rule

This component is used as an intermediate step.

This component, along with the Spark Batch component Palette it belongs to, appears only when you are creating a Spark Batch Job.

Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs, that is to say traditional Talend data integration Jobs.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!