Prerequisites
Before you begin to work with a Hadoop cluster as a data source in Qlik Replicate, make sure that the following prerequisites have been met:
-
General:
- The Hadoop WebHDFS must be accessible from the Qlik Replicate machine.
- The Hadoop Data Nodes must be accessible from the Qlik Replicate machine.
- The Hadoop WebHDFS service must be running.
- To access Hive using WebHCat, the Hadoop WebHCat service must be running. Other methods for accessing Hive are described later in this chapter.
- The user specified in the Qlik Replicate Hadoop target settings must have access to HiveServer2.
-
SSL: Before you can use SSL, you first need to perform the following tasks:
- Configure each NameNode and each DataNode with an SSL certificate (issued by the same CA).
- Place the CA certificate on the Replicate Server machine. The certificate should be a base64-encoded PEM (OpenSSL) file.
-
Permissions: The user specified in the Hadoop source settings must have read permission for the HDFS directories that contain the data files.