Skip to main content

OPENCONNECTOR: DistCp

Use Case: Distributed copy command (DistCp) is used to copy data files from one HDFS source location to same or different HDFS destination cluster node (ex. loadingDockLocation for ingest, shippingLocation for publish) recursively.

Example of a DistCp script with arguments: /usr/local/podium/datasets/put_file_hdfs.sh %prop.p1 %loadingDockLocation

OPENCONNECTOR DistCp property: entity.custom.script.args

Example of custom script arguments property displaying in property panel

Example of a DistCp location property (defined by API): /usr/local/podium/datasets/ENGINE.utf8.bom.txt

OPENCONNECTOR DistCp Property:entity.custom.script.args where (p1) specifies input path

Example of a user defined property displaying in property panel

For the example above, the property (p1) is created by the user and can be passed directly to the script.

Script example for property: entity.custom.script.args:

/user/local/podium/usedistcp.sh %prop.p1 %loadingDockLocation

Script: /user/local/podium/usedistcp.sh

%prop.p1 – First argument the script will take: '%prop' tells the application to use the value set in property: p1.

p1 can be anything the user defined when creating the new property.

For the example above it specifies the input path.

%loadingDockLocation – This is a required argument that every OPENCONNECTOR must take; this path value is automatically generated by the application and passed to the script. The script using distcp will copy the file to this location (Note: This property will always be '%loadingDockLocation' for Ingest OPENCONNECTOR, in the case of Publish it is always '%shippingLocation').

Bash DistCp script:

#!/bin/bash

#usedistcp.sh - copies source location contents to destination location recursively into same or another HDFS node.

#arguments:

#1 - source location

#2 - destination location

#result: source location contents copied in to destination location recursively

#if (($#!=2)); then

#echo 'Usage: <source hdfs location> <destination hdfs location>'

#fi

#hadoop fs -mkdir /tmp/user

#echo $1

#echo $2

#hadoop distcp $1 $2

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!