Skip to main content Skip to complementary content

Parallel Load

In Full Load replication mode, you can accelerate the replication of large tables by splitting the table into segments and loading the segments in parallel. Tables can be segmented by data ranges, by partitions, or by sub-partitions.

Supported endpoints

The task must be defined with a combination of the following source and target endpoints:

  • Amazon RDS for Microsoft SQL Server
  • IBM DB2 for LUW
  • IBM DB2 for z/OS

    Information note

    Table segmentation by partitions or sub-partitions is not supported with the IBM DB2 for z/OS source endpoint.

  • Microsoft SQL Server
  • MySQL
  • Oracle
  • PostgreSQL

    Information note

    Table segmentation by partitions or sub-partitions is not supported with the PostgreSQL source endpoint.

  • SAP Sybase ASE
  • SAP Application
  • SAP Application (DB)
    Information note

    Tables are by default client dependent with the SAP Application (DB) source endpoint. The MANDT column is automatically taken directly from the endpoint.

  • SAP HANA
  • Teradata
  • Amazon EMR
  • Amazon MSK

  • Amazon Redshift
  • Amazon S3
  • Cloudera Data Platform (CDP) Private Cloud

  • Confluent Cloud

  • Databricks (Cloud Storage)
  • File
  • Google Cloud BigQuery
  • Google Cloud SQL for MySQL
  • Google Cloud SQL for PostgreSQL
  • Google Cloud Storage
  • Google Dataproc
  • Hadoop (Hortonworks, Cloudera, and MapR)
  • Hortonworks Data Platform (HDP)
  • Kafka
  • Microsoft Azure ADLS
  • Microsoft Azure Database for MySQL
  • Microsoft Azure Database for PostgreSQL
  • Microsoft Azure Data Warehouse
  • Microsoft Azure HDInsight
  • Microsoft Azure SQL Database
  • Microsoft SQL Server
  • MySQL
  • Oracle
  • PostgreSQL
  • Snowflake on Google
  • Snowflake on AWS
  • Snowflake on Azure
  • Sybase ASE
  • Vertica

Setting up Parallel Load

  1. In the Parallel Load tab's Select Parallel Load Method section, select Use Data Ranges.
  2. In the Select Details section, click Select Segment Columns.

    The Columns window opens

  3. For all endpoints, the Unique Index column is automatically selected. Select which additional columns whose data you wish to use to delineate the ranges and then click OK.

    Information note
    • Selecting indexed columns will significantly improve performance

    • You can select up to ten columns (multi-selection is supported)

    • Records with null values will not be replicated

    • The following data types cannot be used to define segments by ranges: DOUBLE, FLOAT, and LOB (BLOB, CLOB, NCLOB)

  4. In the Define Segment Boundaries section:
    1. Click Add Segment to add a segment.

      The columns that you selected will appear as table headings.

    2. Enter the upper data range for the segment in the selected columns.

      Information note

      Values in DATE columns must be entered in the format supported by the source. For example, for an Oracle source, the correct format would be:

      • ALTER SESSION SET NLS_DATE_FORMAT:

        'YYYY-MM-DD HH24:MI:SS' (specifying YYYY-MM-DD only is also valid)

      • ALTER SESSION SET NLS_TIMESTAMP_FORMAT:

        'YYYY-MM-DD HH24:MI:SS.FF9'

      • ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT:

        'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM'

    3. Add additional segments as required.
    4. Click Validate to validate that the specified data corresponds to the source column data type and that all of the defined segments contain values.
    5. To delete a segment, select the desired segment and then click Delete.
  5. Click OK to save your settings.
Information note

When Use Data Ranges is selected, all of the table data will be replicated, even if data ranges are not defined for all of the columns.

Usage example

Let's assume that the following segments are defined in the Define Segment Boundaries table:

Example table data
Column_1 Column_2 Column_3

10

30

105

20

20

120

100

12

99

In this case, the following "WHERE" clauses will be created for each load segment:

  • Segment 1: ((COL1 < 10) OR ((COL1 = 10) AND (COL2 < 30)) OR ((COL1 = 10) AND (COL2 = 30) AND (COL3 < 105)))
  • Segment 2: NOT ((COL1 < 10) OR ((COL1 = 10) AND (COL2 < 30)) OR ((COL1 = 10) AND (COL2 = 30) AND (COL3 < 105))) AND ((COL1 < 20) OR ((COL1 = 20) AND (COL2 < 20)) OR ((COL1 = 20) AND (COL2 = 20) AND (COL3 < 120)))
  • Segment 3: NOT ((COL1 < 20) OR ((COL1 = 20) AND (COL2 < 20)) OR ((COL1 = 30) AND (COL2 = 20) AND (COL3 < 120))) AND ((COL1 < 100) OR ((COL1 = 100) AND (COL2 < 12)) OR ((COL1 = 100) AND (COL2 = 12) AND (COL3 < 99)))
  • Segment 4: NOT ((COL1 < 100) OR ((COL1 = 100) AND (COL2 < 12)) OR ((COL1 = 100) AND (COL2 = 12) AND (COL3 < 99)))
Information note

Only select this method if you are sure that the table is already partitioned.

  1. In the Parallel Load tab's Select Parallel Load Method section, select Use Partitions.
  2. In the Select Partitions section, select Use all table partitions. This will segment the table according to partitions that already exist in the source database.

  3. Select one the following:

    • Use main partitions

    • Use sub partitions

      Information note

      This option will be disabled if the source database does not support sub-partitions.

  4. Click OK.
Information note

Only select this method if you are sure that the table is already partitioned.

  1. In the Parallel Load tab's Select Parallel Load Method section, select Use Partitions.

  2. In the Select Partitions section, select Specify partitions. This will split the data according to the specified source partitions.

    Information note

    When Specify partitions is selected, only the specified partitions will be replicated.

  3. Click Add Partition.
  4. Specify the name of an existing partition or sub-partition.
  5. If you specified the name of a sub-partition, select the check box in the Sub-Partition column.

    Information note

    The check box will be disabled if the source database does not support sub-partitions.

  6. Add additional partitions/sub-partitions as required.
  7. To delete a partition/sub-partition, select the partition/sub-partition and then click Delete.
  8. Click OK to save your settings.

Adjusting the number of segments that can be loaded in parallel

You can increase or decrease the number of segments that will be loaded in parallel. For example, if you selected the Use all table partitions option and the source table has 20 partitions, increasing the default number of concurrent tasks (5) may improve performance.

Information note

Sub-tasks are allocated for each segment \ partition \ sub partition.

For example: If you select a table with 6 partitions and load the table using the Use Partitions method, 5 partitions will be loaded in parallel, corresponding with the default number of concurrent tasks (5). When one of the sub-tasks completes its run, it will be assigned to loading the sixth partition.

The currently set value is displayed at the bottom of the Parallel Load tab. You can modify this value in the Maximum number of tables to load in parallel field in the Full Load Tuning tab.

 

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!