Appending to and overwriting partitions in dataflows
While running a prepare dataflow, a user can select a specific partition and either Append to it or Overwrite the data in that partition. The selected partition is applicable to all target entities in the dataflow – users cannot for example, select different partitions for different target entities in the same dataflow. This functionality is also available upon ingest in the source module. Note that when a load is appended to or overwritten, the workorder summary for the original load is deleted.
Partition options display when a user Executes a dataflow selecting Local or an available execution engine (MapReduce, Tez, Spark). Global parameters can be overwritten and partition options are applied for that run only.
Creating a new partition
Default. All records are instantiated in new data directories (sample | good | bad | ugly log folders).
Append to an existing partition
Parent load partition will be updated to reflect the incremented (total) record count of the appended loads in the case of entity level Managed (Data load type) and Registered (Statistics load type) upon dataflow execution.
Profile Statistics in the parent work order (partition) are consolidated across the first base new load and subsequent appends.
Note that deleting the logs for an Append load will only remove the logs and not affect any data.
Overwrite an existing partition
Overwrite deletes all previous files in an existing partition and replaces them with a new load of records upon dataflow run.
Profile data is overwritten reflecting profile information of the new records. Deleting the logs for an Overwrite load will only remove the logs and not affect any data.
Load Logs for appended/overwritten partitions, record counts
Data: New load default. All records are instantiated in appropriate sample | good | bad | ugly log folders.
Append: Record counts are appended to the parent workload (partition) incrementally. After the jobs have successfully finished, data in temporary sample | good | bad | ugly log folders will be moved to the existing partition, overwriting previous profile Data or Statistics.
Overwrite: Deletes all previous records and replaces them with a new load. After the jobs have finished, data in temporary sample | good | bad | ugly log folders will replace those in parent partition, overwriting profile Data or Statistics.
Append and Overwrite Logs: Parent partition reflects aggregate record count.
For example, if an original data load (load log type: Data) has 42 records, the next load log type is Append with 42 more records, the original load partition will reflect 84 records. If an Overwrite load type is then executed with 12 records, it will overwrite all records, the parent partition will now have 12 records.