Parallel Load

In Full Load replication mode, you can accelerate the replication of large tables by splitting the table into segments and loading the segments in parallel. Tables can be segmented by data ranges, by partitions, or by sub-partitions.

Supported endpoints

The task must be defined with a combination of the following source and target endpoints:

Supported source endpoints:

Amazon RDS for Microsoft SQL Server
IBM DB2 for LUW
IBM DB2 for z/OS

Information note
Table segmentation by partitions or sub-partitions is not supported with the IBM DB2 for z/OS source endpoint.
Microsoft SQL Server
MySQL
Oracle
PostgreSQL

Information note
Table segmentation by partitions or sub-partitions is not supported with the PostgreSQL source endpoint.
SAP Sybase ASE
SAP Application
SAP Application (DB)
Information note
Tables are by default client dependent with the SAP Application (DB) source endpoint. The MANDT column is automatically taken directly from the endpoint.
SAP HANA
Teradata

Supported target endpoints:

Amazon EMR
Amazon MSK
Amazon Redshift
Amazon S3
Cloudera Data Platform (CDP) Private Cloud
Confluent Cloud
Databricks (Cloud Storage)
File
Google Cloud BigQuery
Google Cloud SQL for MySQL
Google Cloud SQL for PostgreSQL
Google Cloud Storage
Google Dataproc
Hadoop (Hortonworks, Cloudera, and MapR)
Hortonworks Data Platform (HDP)
Kafka
Microsoft Azure ADLS
Microsoft Azure Database for MySQL
Microsoft Azure Database for PostgreSQL
Microsoft Azure Data Warehouse
Microsoft Azure HDInsight
Microsoft Azure SQL Database
Microsoft SQL Server
MySQL
Oracle
PostgreSQL
Snowflake on Google
Snowflake on AWS
Snowflake on Azure
Sybase ASE
Vertica

Setting up Parallel Load

To define segment boundaries by data range:

In the Parallel Load tab's Select Parallel Load Method section, select Use Data Ranges.
In the Select Details section, click Select Segment Columns.

The Columns window opens
For all endpoints, the Unique Index column is automatically selected. Select which additional columns whose data you wish to use to delineate the ranges and then click OK.
Information note
- Selecting indexed columns will significantly improve performance
- You can select up to ten columns (multi-selection is supported)
- Records with null values will not be replicated
- The following data types cannot be used to define segments by ranges: DOUBLE, FLOAT, and LOB (BLOB, CLOB, NCLOB)
In the Define Segment Boundaries section:
1. Click Add Segment to add a segment.
  The columns that you selected will appear as table headings.
2. Enter the upper data range for the segment in the selected columns.
  Information note
  Values in DATE columns must be entered in the format supported by the source. For example, for an Oracle source, the correct format would be:
  - ALTER SESSION SET NLS_DATE_FORMAT:
    'YYYY-MM-DD HH24:MI:SS' (specifying YYYY-MM-DD only is also valid)
  - ALTER SESSION SET NLS_TIMESTAMP_FORMAT:
    'YYYY-MM-DD HH24:MI:SS.FF9'
  - ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT:
    'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM'
3. Add additional segments as required.
4. Click Validate to validate that the specified data corresponds to the source column data type and that all of the defined segments contain values.
5. To delete a segment, select the desired segment and then click Delete.
Click OK to save your settings.

When Use Data Ranges is selected, all of the table data will be replicated, even if data ranges are not defined for all of the columns.

Usage example

Let's assume that the following segments are defined in the Define Segment Boundaries table:

Example table data
Column_1	Column_2	Column_3
10	30	105
20	20	120
100	12	99

In this case, the following "WHERE" clauses will be created for each load segment:

Segment 1: ((COL1 < 10) OR ((COL1 = 10) AND (COL2 < 30)) OR ((COL1 = 10) AND (COL2 = 30) AND (COL3 < 105)))
Segment 2: NOT ((COL1 < 10) OR ((COL1 = 10) AND (COL2 < 30)) OR ((COL1 = 10) AND (COL2 = 30) AND (COL3 < 105))) AND ((COL1 < 20) OR ((COL1 = 20) AND (COL2 < 20)) OR ((COL1 = 20) AND (COL2 = 20) AND (COL3 < 120)))
Segment 3: NOT ((COL1 < 20) OR ((COL1 = 20) AND (COL2 < 20)) OR ((COL1 = 30) AND (COL2 = 20) AND (COL3 < 120))) AND ((COL1 < 100) OR ((COL1 = 100) AND (COL2 < 12)) OR ((COL1 = 100) AND (COL2 = 12) AND (COL3 < 99)))
Segment 4: NOT ((COL1 < 100) OR ((COL1 = 100) AND (COL2 < 12)) OR ((COL1 = 100) AND (COL2 = 12) AND (COL3 < 99)))

To define segment boundaries by all of the table partitions:

Only select this method if you are sure that the table is already partitioned.

In the Parallel Load tab's Select Parallel Load Method section, select Use Partitions.
In the Select Partitions section, select Use all table partitions. This will segment the table according to partitions that already exist in the source database.
Select one the following:
- Use main partitions
- Use sub partitions
  
  Information note
  This option will be disabled if the source database does not support sub-partitions.
Click OK.

To define segment boundaries by specific partitions:

Only select this method if you are sure that the table is already partitioned.

In the Parallel Load tab's Select Parallel Load Method section, select Use Partitions.
In the Select Partitions section, select Specify partitions. This will split the data according to the specified source partitions.

Information note
When Specify partitions is selected, only the specified partitions will be replicated.
Click Add Partition.
Specify the name of an existing partition or sub-partition.
If you specified the name of a sub-partition, select the check box in the Sub-Partition column.

Information note
The check box will be disabled if the source database does not support sub-partitions.
Add additional partitions/sub-partitions as required.
To delete a partition/sub-partition, select the partition/sub-partition and then click Delete.
Click OK to save your settings.

Adjusting the number of segments that can be loaded in parallel

You can increase or decrease the number of segments that will be loaded in parallel. For example, if you selected the Use all table partitions option and the source table has 20 partitions, increasing the default number of concurrent tasks (5) may improve performance.

Sub-tasks are allocated for each segment \ partition \ sub partition.

For example: If you select a table with 6 partitions and load the table using the Use Partitions method, 5 partitions will be loaded in parallel, corresponding with the default number of concurrent tasks (5). When one of the sub-tasks completes its run, it will be assigned to loading the sixth partition.

The currently set value is displayed at the bottom of the Parallel Load tab. You can modify this value in the Maximum number of tables to load in parallel field in the Full Load Tuning tab.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here