Qlik Data Catalyst supports the use of OPENCONNECTOR scripts to land flat files into HDFS or File System LoadingDock. This is especially useful for using native Hadoop connectors to RDBMS via scripting mechanisms such as DistCp or Secure Copy (SCP). SQOOP transports are supported through Source Connection creation in Source.
DistCp transfers data files that already reside on a cluster to File System (LoadingDock) on the same or a different cluster
SCP is a protocol based on Secure Shell (SSH) used for securely transferring files between a local host and remote host or between two remote hosts.
Scripts require the creation of properties. See Creating a property through API call: PUT /propDef/v1/save for details on creating these property. See details for Sample Payload.3 +4 (OpenConnector Properties)
Example: DistCp script example:
/usr/local/podium/misc/usedistcp.sh %prop.p1 %loadingDockLocation
Example: Secure Copy (SCP/SSH) script example:
/root/custom/podium/put_file_hdfs.sh %prop.sfile %prop.starget %loadingDockLocation
Source Type: FILE (will always be file)
Communication Protocol: OPENCONNECTOR
This property provides a field where the path to the script plus any arguments such as password/username and parameters are passed. This property can be set after the data source objects are created through the wizard –as an OPENCONNECTOR. If you define a Source and Entity through JDBC for example, this property is manually entered after the source and entity are created (%attr.source, #attr.entity).
Core property: entity.custom.script.args
OPENCONNECTOR script (example): /usr/local/podium/bin/usedistcptest.sh %prop.p1 %loadingDockLocation
%loadingDockLocation – Required argument that every OPENCONNECTOR must take, it is the PATH that the application creates to LoadingDock. For example, this script using DistCp will copy the data file into this location; the value will be automatically generated by the application.
%loadingDockUri – This argument can be used as an optional argument for an OPENCONNECTOR script to provide a fully qualified path or initiate a script launch to a destination on S3. The argument is also used in QVD Import to provide a full URI mount point from which to launch the OPENCONNECTOR script. The argument can provide a temporary folder in a case where the load is Appending in S3 (and the application will error if it registers that the destination folder already exists). Note that in this scenario, the location of the temporary root directory must be in the same bucket as the directory. For example: s3a://example-bucket/temporary-rootdir or s3a://example-bucket/target-directory/temporary-rootdir.
example: %loadingDockUri (argument):
entity.custom.script.args=/usr/local/datacatalyst/migrationdir/put_file_fs.sh --temporary-rootdir s3a://dev-landing/temporary-rootdir --target-dir s3a://dev-landing/datacatalyst/loadingdock/DW_PR/INCREMENT/20190722124834/
%prop.p1 – First argument the script will take
%attr.source.username – Refers to an attribute of the source. For example, attr.source can be username
Properties should either start with %prop or %attr
%prop should be followed by the name of the property of either a source or the entity. For example, %prop.username should return the connection username.
%attr should be followed by" source" or "entity", then the attribute desired. There are two such attributes: name and username. Name is the name of either the source or the entity, while username is just a different way to access the connection username. Example: %attr.entity.name is the name of the entity.
/usr/local/podium/putfilehdfs.sh %prop.sfile %loadingDockLocation
/usr/local/podium/putfilehdfs.sh – (example) script to run
%prop.sfile – First argument the script will take: %prop (tells Qlik Data Catalyst to use the value set in property: sfile)
sfile can be anything the user defines as the value. In this example it specifies the input path.
%loadingDockLocation –required argument that every CUSTOM argument must take – it is the PATH that the application creates in LoadingDock – so for example – this script using distcp will copy the datafile into this location.