Parallel Load
In Full Load replication mode, you can accelerate the replication of large tables by splitting the table into segments and loading the segments in parallel. Tables can be segmented by data ranges, by partitions, or by sub-partitions.
Supported endpoints
The task must be defined with a combination of the following source and target endpoints:
Supported source endpoints:
- Amazon RDS for Microsoft SQL Server
- IBM DB2 for LUW
-
IBM DB2 for z/OS
Information noteTable segmentation by partitions or sub-partitions is not supported with the IBM DB2 for z/OS source endpoint.
- Microsoft SQL Server
- MySQL
- Oracle
-
PostgreSQL
Information noteTable segmentation by partitions or sub-partitions is not supported with the PostgreSQL source endpoint.
- SAP Sybase ASE
- SAP Application
- SAP Application (DB)Information note
Tables are by default client dependent with the SAP Application (DB) source endpoint. The MANDT column is automatically taken directly from the endpoint.
- SAP HANA
- Teradata
Supported target endpoints:
- Amazon EMR
-
Amazon MSK
- Amazon Redshift
- Amazon S3
-
Cloudera Data Platform (CDP) Private Cloud
-
Confluent Cloud
- Databricks (Cloud Storage)
- File
- Google Cloud BigQuery
- Google Cloud SQL for MySQL
- Google Cloud SQL for PostgreSQL
- Google Cloud Storage
- Google Dataproc
- Hadoop (Hortonworks and Cloudera)
- Hortonworks Data Platform (HDP)
- Kafka
- Microsoft Azure ADLS
- Microsoft Azure Database for MySQL
- Microsoft Azure Database for PostgreSQL
- Microsoft Azure Data Warehouse
- Microsoft Azure HDInsight
- Microsoft Azure SQL Database
- Microsoft Fabric Data Warehouse
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Snowflake on Google
- Snowflake on AWS
- Snowflake on Azure
-
Sybase ASE
Warning noteTo prevent deadlocks when performing a full parallel load of partitioned tables into Sybase ASE, it is strongly recommended to enable the Create primary key or unique index after full load completes option in the Full Load settings.
Setting up Parallel Load
To define segment boundaries by data range:
- In the Parallel Load tab's Select Parallel Load Method section, select Use Data Ranges.
-
In the Select Details section, click Select Segment Columns.
The Columns window opens
-
For all endpoints, the Unique Index column is automatically selected. Select which additional columns whose data you wish to use to delineate the ranges and then click OK.
Information note-
Selecting indexed columns will significantly improve performance
-
You can select up to ten columns (multi-selection is supported)
-
Records with null values will not be replicated
-
The following data types cannot be used to define segments by ranges: DOUBLE, FLOAT, and LOB (BLOB, CLOB, NCLOB)
-
- In the Define Segment Boundaries section:
Click Add Segment to add a segment.
The columns that you selected will appear as table headings.
Enter the upper data range for the segment in the selected columns.
Information noteValues in DATE columns must be entered in the format supported by the source. For example, for an Oracle source, the correct format would be:
ALTER SESSION SET NLS_DATE_FORMAT:
'YYYY-MM-DD HH24:MI:SS' (specifying YYYY-MM-DD only is also valid)
ALTER SESSION SET NLS_TIMESTAMP_FORMAT:
'YYYY-MM-DD HH24:MI:SS.FF9'
ALTER SESSION SET NLS_TIMESTAMP_TZ_FORMAT:
'YYYY-MM-DD HH24:MI:SS.FF9 TZH:TZM'
- Add additional segments as required.
- Click Validate to validate that the specified data corresponds to the source column data type and that all of the defined segments contain values.
- To delete a segment, select the desired segment and then click Delete.
- Click OK to save your settings.
When Use Data Ranges is selected, all of the table data will be replicated, even if data ranges are not defined for all of the columns.
Usage example
Let's assume that the following segments are defined in the Define Segment Boundaries table:
Column_1 | Column_2 | Column_3 |
---|---|---|
10 |
30 |
105 |
20 |
20 |
120 |
100 |
12 |
99 |
In this case, the following "WHERE" clauses will be created for each load segment:
- Segment 1: ((COL1 < 10) OR ((COL1 = 10) AND (COL2 < 30)) OR ((COL1 = 10) AND (COL2 = 30) AND (COL3 < 105)))
- Segment 2: NOT ((COL1 < 10) OR ((COL1 = 10) AND (COL2 < 30)) OR ((COL1 = 10) AND (COL2 = 30) AND (COL3 < 105))) AND ((COL1 < 20) OR ((COL1 = 20) AND (COL2 < 20)) OR ((COL1 = 20) AND (COL2 = 20) AND (COL3 < 120)))
- Segment 3: NOT ((COL1 < 20) OR ((COL1 = 20) AND (COL2 < 20)) OR ((COL1 = 30) AND (COL2 = 20) AND (COL3 < 120))) AND ((COL1 < 100) OR ((COL1 = 100) AND (COL2 < 12)) OR ((COL1 = 100) AND (COL2 = 12) AND (COL3 < 99)))
- Segment 4: NOT ((COL1 < 100) OR ((COL1 = 100) AND (COL2 < 12)) OR ((COL1 = 100) AND (COL2 = 12) AND (COL3 < 99)))
To define segment boundaries by all of the table partitions:
Only select this method if you are sure that the table is already partitioned.
- In the Parallel Load tab's Select Parallel Load Method section, select Use Partitions.
-
In the Select Partitions section, select Use all table partitions. This will segment the table according to partitions that already exist in the source database.
-
Select one the following:
-
Use main partitions
-
Use sub partitions
Information noteThis option will be disabled if the source database does not support sub-partitions.
-
- Click OK.
To define segment boundaries by specific partitions:
Only select this method if you are sure that the table is already partitioned.
-
In the Parallel Load tab's Select Parallel Load Method section, select Use Partitions.
-
In the Select Partitions section, select Specify partitions. This will split the data according to the specified source partitions.
Information noteWhen Specify partitions is selected, only the specified partitions will be replicated.
- Click Add Partition.
- Specify the name of an existing partition or sub-partition.
-
If you specified the name of a sub-partition, select the check box in the Sub-Partition column.
Information noteThe check box will be disabled if the source database does not support sub-partitions.
- Add additional partitions/sub-partitions as required.
- To delete a partition/sub-partition, select the partition/sub-partition and then click Delete.
- Click OK to save your settings.
Adjusting the number of segments that can be loaded in parallel
You can increase or decrease the number of segments that will be loaded in parallel. For example, if you selected the Use all table partitions option and the source table has 20 partitions, increasing the default number of concurrent tasks (5) may improve performance.
Sub-tasks are allocated for each segment \ partition \ sub partition.
For example: If you select a table with 6 partitions and load the table using the Use Partitions method, 5 partitions will be loaded in parallel, corresponding with the default number of concurrent tasks (5). When one of the sub-tasks completes its run, it will be assigned to loading the sixth partition.
The currently set value is displayed at the bottom of the Parallel Load tab. You can modify this value in the Maximum number of tables to load in parallel field in the Full Load Tuning tab.