Skip to main content

File uploads

Note that the File Uploads tab will only be shown if the task is defined with an endpoint that supports this feature.

Click the Optimize File Uploads button to improve performance when replicating to file-based targets such as Amazon S3 and Hadoop. When this feature is enabled, the button text changes to Disable File Upload Optimization. Click the Disable File Upload Optimization button to disable file upload optimization.

The upload mode depends on the task type:

Note that disabling this option after the task has already started will require you to do one of the following:

  • If the task is in the Full Load stage, reload the target using the Reload Target Run option.
  • If the task is in the Change Processing stage, resume the task using the Start processing changes from Run option.
Note:
  • Supported by the following target endpoints only: Amazon S3, Hadoop (Hortonworks, Cloudera, and MapR) Microsoft Azure ADLS, Microsoft Azure Databricks, Databricks on AWS, Microsoft Azure HDInsight, Hortonworks Data Platform (HDP), Google Cloud Storage, Google Cloud Dataproc, Amazon EMR, and Cloudera Data Platform (CDP) Private Cloud.

  • General Limitations and Considerations:
    • Post Upload Processing endpoint settings are not supported.

  • Hadoop - Limitations and Considerations:
    • When replicating to a Hadoop target, only Text and Sequence file formats are supported.
    • Hive jobs are not supported as they will prevent the file upload.
    • Append is not supported when using Text file format.
  • Amazon S3 and Microsoft Azure ADLS - Limitations and Considerations:
    • When working with Reference Files, a new entry is added to the Reference File immediately after the data file is uploaded (even if the DFM file has not been uploaded yet).
    • The existence of the DFM file does not necessarily mean that the associated data file has also been uploaded.