Change Data Partitioning on Hadoop

When Change Data Partitioning is enabled, the Replicate Change Tables in Hive are partitioned by the partition_name column. Data files are uploaded to HDFS, according to the maximum size and time definition, and then stored in a directory under the Change Table directory. Whenever the specified partition timeframe ends, a partition is created in Hive, pointing to the HDFS directory.

Information about the partitions is written to the attrep_cdc_partitions Control Table.

Prerequisites

The prerequisites for using Change Data Partitioning with a Hadoop target endpoint are as follows:

The target file format must be set to Text or Sequence
Hive access must be set to ODBC

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here