CDC architectural overview
Data warehousing involves the extraction and transportation of data from one or more databases into a target system or systems for analysis. But this involves the extraction and transportation of huge volumes of data and is very expensive in both resources and time.
The ability to capture only the changed source data and to move it from a source to a target system(s) in real time is known as Change Data Capture (CDC). Capturing changes reduces traffic across a network and thus helps reduce ETL time.
The CDC feature, introduced in Talend Studio, simplifies the process of identifying the change data since the last extraction. CDC in Talend Studio quickly identifies and captures data that has been added to, updated in, or removed from database tables and makes this change data available for future use by applications or individuals. The CDC feature is available for Oracle, MySQL, DB2, PostgreSQL, Sybase, MS SQL Server, Informix, Ingres, Teradata, and AS/400.
Three different CDC modes are available in Talend Studio:
-
Trigger: this mode is the by-default mode used by CDC components.
-
Redo/Archive log: this mode is used with Oracle v11 and previous versions and AS/400.
-
XStream: this mode is used only with Oracle v12 with OCI.
For detailed information on these three modes, see the following sections.