Change Data Partitioning
When Change Data Partitioning is enabled, the Replicate Change Tables in Hive are partitioned by the partition_name
column. Data files are uploaded to your preferred storage provider, according to the maximum size and time definition, and then stored in a directory under the Change Table directory. Whenever the specified partition timeframe ends, a partition is created in Hive, pointing to the target directory on your preferred storage provider.
Information about the partitions is written to the attrep_cdc_partitions Control Table.
Prerequisites
The prerequisites for using Change Data Partitioning with the Hortonworks Data Platform (HDP) endpoint are as follows:
- The target file format must be set to Text or Sequence
- Hive access must be set to ODBC