Skip to main content Skip to complementary content

Change Data Capture (CDC)

CDC architectural overview

Data warehousing involves the extraction and transportation of data from one or more databases into a target system or systems for analysis. But this involves the extraction and transportation of huge volumes of data and is very expensive in both resources and time.

The ability to capture only the changed source data and to move it from a source to a target system(s) in real time is known as Change Data Capture (CDC). Capturing changes reduces traffic across a network and thus helps reduce ETL time.

The CDC feature, introduced in Talend Studio, simplifies the process of identifying the change data since the last extraction. CDC in Talend Studio quickly identifies and captures data that has been added to, updated in, or removed from database tables and makes this change data available for future use by applications or individuals. The CDC feature is available for Oracle, MySQL, DB2, PostgreSQL, Sybase, MS SQL Server, Informix, Ingres, Teradata, and AS/400.

Information noteWarning: The CDC feature works only with database systems running on the same server.

Three different CDC modes are available in Talend Studio:

  • Trigger: this mode is the by-default mode used by CDC components.

  • Redo/Archive log: this mode is used with Oracle v11 and previous versions and AS/400.

  • XStream: this mode is used only with Oracle v12 with OCI.

For detailed information on these three modes, see the following sections.

Trigger mode

This mode is available for the following databases: MySQL, Oracle, DB2, PostgreSQL, Sybase, MS SQL Server, Informix, Ingres, and Teradata.

The Trigger mode places a trigger that launches change data capture on every monitored source table. This, by turn, imposes little modifications on database structure.

With this mode, data extraction takes place at the same time the Insert, Update, or Delete operations occur in the source tables, and the change data is stored inside the database in change tables. The changed data, thus captured, is then made available to the target system(s) in a controlled manner, using subscriber views.

In Trigger mode, CDC can have only one publisher but many subscribers. CDC creates subscriber tables to control accessibility of the change table data by the target system(s). A target system is any application that wants to use the data captured from the source system.

The below figure shows the basic architecture of a CDC environment in Trigger mode in Talend Studio.

Basic architecture of a CDC environment in Trigger mode.

In this example, CDC monitors the changes made to a Product table. The changes are caught and published in a change table to which two subscribers have access: a CRM application and an Accounting application. These two systems fetch the changes and use them to update their data.

CDC Redo/Archive log mode

The Redo/Archive log mode is only available for Oracle database Enterprise Editions v11 and AS/400 databases. It is equivalent to the archive log mode for Oracle and to the journal mode for AS/400.

Information noteNote: Oracle v11 and the Redo/Archive log mode for Oracle are deprecated.

In an Oracle database, a Redo log is a file which logs the history of changes made to data. In an AS/400 database, these changes are logged automatically in the database's internal logbook (journal). These changes include the insert, update and delete operations which data may undergo.

Redo/Archive log mode is less intrusive than Trigger mode because in contrast to Trigger mode, it does not require modifications to the database structure.

When setting up this Redo/Archive log mode for Oracle, only one subscriber can have access rights to the change table. This subscriber must be a database user who holds the subscription rights. Also, there is a subscription table which controls access to the subscriber change table. The subscription change table is a comprehensive, internal table which reflects the state of the Oracle database at the moment at which the Redo/Archive log option was activated.

When setting up this mode for AS/400, a save file, called fitcdc.savf and provided in Talend Studio, is restored on AS/400 and used to install a program called RUNCDC. When the subscriber views the changes made (View all changes) or consumes them for reuse (using a tAS400CDC component), the RUNCDC program reads and analyzes the logbook (journal) and the attached receiver from the source table and updates the change table accordingly. The AS/400 CDC Redo/Archive log mode (journal) creates subscription tables to prevent unauthorized target systems from accessing the data in the change tables. A target system means any application which tries to use data captured in the source system.

Basic architecture of a CDC environment in Redo Archive/log mode.

In this example, the CDC monitors the changes made to a Product table, thanks to the data contained in the database's logbook (journal). The CDC reads the logbook and records the changes which have been made to the data. These changes are collected and published in a table of changes to which two subscribers have access, a CRM application and an Accounting application. These two systems fetch the changes and use them to update their data.

XStream mode

XStream provides a framework for sharing real-time data changes with outstanding performance and usability between Oracle databases and other systems such as non-Oracle databases and third party software applications. XStream consists of two major features: XStream Out and XStream In.

XStream Out provides Oracle Database components and application programming interfaces that enable you to share data changes made to an Oracle database with other systems. It also provides a transaction-based interface for streaming the changes captured from the redo log of the Oracle database to client applications with an outbound server. An outbound server is an optional Oracle background process that sends data changes to a client application.

XStream In provides Oracle Database components and application programming interfaces that enable you to share data changes made to other systems with an Oracle database. It also provides a transaction-based interface for sending information to an Oracle database from client applications with an inbound server. An inbound server is an optional Oracle background process that receives data changes from a client application.

The XStream mode is only available for Oracle v12 with OCI in Talend Studio. For more information about the XStream mode, see Database XStream Guide.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!