Terminology
The following section describes some key terms used throughout this Help.
Change Data Capture (CDC)
Captures changes in the source data or metadata as they occur and applies them to the target endpoint as soon as possible, in near-real-time. The changes are captured and applied as units of single committed transactions and several different target tables may be updated as the result of a single source commit. This guarantees transactional integrity in the target endpoint. The CDC process for any file or table starts as soon as the data loading operation for the file or table begins.
Full load
Creates all defined files or tables on the target endpoint, automatically defines the metadata that is required at the target, and populates the tables with data from the source.
Apply latency
The gap in seconds between capturing a change in one of the source tables and applying that change to the target endpoint.
Latency when applying large transactions
This is best explained by way of example. When the most recent Apply Latency value was 10 seconds and now a transaction of one million rows gets committed at the source endpoint, Qlik Talend Data Integration starts to apply that transaction to the selected target and it will take some time to write all the changes to the target (for example 60 seconds). During the next 60 seconds, the latency value gradually grows to 70 seconds for the last change in the transaction. Once the transaction is committed, the latency drops back to the 'regular' latency (10 seconds in this case).
Latency when no transactions are being applied
When a time period passes with no changes applied to the target, the latency calculation is based on the time difference between the current time and the timestamp of the last change event read from the transaction log. This could happen, for example, if there is a high volume of activity on tables that were not selected for movendo in the current task.
Source latency
The gap in seconds between when the source database wrote an event to its transaction log and when Qlik Talend Data Integration captured that change.
Target latency
The gap between when a commit is seen by Qlik Talend Data Integration (reading the source transaction log) and when the changes of that commit are seen in the target.
Overall latency
The overall latency is defined as the time gap between when a change is committed in the source database and when it is visible in the target database.
Source endpoint
A collection of files or tables managed by an endpoint management system (such as, Oracle, SQL Server) that is part of the main computing service of the IT organization of an enterprise. This source continuously updated, may need to provide a high throughput rate, may have strict 24/7 up-time requirements, and may reference or update a number of tables in the course of a single logical transaction while providing transactional consistency and integrity for the data.
Target endpoint
A collection of files or tables managed by an Endpoint Management System (DBMS), which may be different from the DBMS managing the source endpoint. It contains data that is derived from the source. It may contain only a subset of the tables, columns, or rows that appear in the source. Its tables may contain columns that do not appear in the source but are transformations or computations based on the source data.
Net Changes table
Qlik Talend Data Integration performs data replication based on changes that appear in the source database's transaction log. A single update operation on the source, such as "UPDATE MyTable SET f1=..., f2=..." could potentially update many rows in the source database and create a large number of change records that Qlik Talend Data Integration will need to apply to the target. Qlik Talend Data Integration offers two Change Processing modes: Transactional apply and Batch optimized apply. In Transactional apply Change Processing mode, Qlik Talend Data Integration essentially applies each change to the target, which may take much longer than the original UPDATE took on the source. Batch optimized apply mode, on the other hand, is designed to handle efficient replication of a large number of changes. In this mode, Qlik Talend Data Integration accumulates changes for multiple tables in a memory cache. Repeated changes to the same row are updated in the memory cache. When the maximum memory cache size defined for the task is reached (or when the configured time has elapsed), Qlik Talend Data Integration does the following:
- Writes the cached (net) changes to a special table on the target (the Net Changes table),
- Bulk uploads the changes to the target table
- Uses SQL statements to update the target tables based on the data in the Net Changes table.