Setting general connection properties
This section describes how to configure general connection properties. For an explanation of how to configure advanced connection properties, see Setting advanced connection properties.
To add a Hadoop source endpoint to Qlik Replicate:
- In the Qlik Replicate console, click the Manage Endpoint Connections toolbar button to open the Manage Endpoints Connections dialog box. Then click the New Endpoint Connection button. For more information on adding an endpoint to Qlik Replicate, see Defining and managing endpoints.
- In the Name field, type a name for your endpoint. This can be any name that will help to identify the endpoint being used.
- In the Description field, type a description that helps to identify the Hadoop endpoint. This is optional.
-
Select Source as the endpoint Role.
- Select Hadoop as the endpoint Type.
-
In the Hadoop NameNode field, enter the host name or IP address of the Hadoop NameNode machine.
Information noteConsider the following:
-
This information is case sensitive.
-
To determine if you are connected to the endpoint you want to use or if the connection information you entered is correct, click Test Connection.
If the connection is successful, a green confirmation message is displayed. If the connection fails, an error message is displayed at the bottom of the dialog box.
To view the log entry if the connection fails, click View Log. The server log is displayed with the information for the connection failure. Note that this button is not available unless the test connection fails.
-
-
In the Security section, do the following:
-
To encrypt the data between the Replicate machine and HDFS, select Use SSL. In order to use SSL, first make sure that the SSL prerequisites described in Prerequisites has been met.
In the CA path field, either specify the directory containing the CA certificate.
-OR-
Specify the full path to a specific CA certificate.
-
Select one of the following authentication types:
- User name - Select to connect to the Hadoop cluster with only a user name. Then, in the User name field, specify the name of a user authorized to access the Hadoop cluster.
-
Kerberos - Select to authenticate against the Hadoop cluster using Kerberos. Replicate automatically detects whether Qlik Replicate Server is running on Linux or on Windows and displays the appropriate settings.
Qlik Replicate Server on Linux:
When Qlik Replicate Server is running on Linux, select either Ticket or Keytab from the Kerberos options drop-down list.
If you selected Ticket, select one of the following options:
-
Use global Kerberos ticket file - Select this option if you want to use the same ticket for several Hadoop endpoints (source or target). In this case, you must make sure to select this option for each Hadoop endpoint instance that you define.
-
Use specific Kerberos ticket file - Select this option if you want to use a different ticket file for each Hadoop endpoint. Then specify the ticket file name in the designated field.
This option is especially useful if you need to perform a task-level audit of Replicate activity (using a third-party tool) on the Hadoop NameNode. To set this up, define several instances of the same Hadoop endpoint and specify a unique Kerberos ticket file for each instance. Then, for each task, simply select a different Hadoop endpoint instance.
Information note-
You need to define a global Kerberos ticket file even if you select the Use specific Kerberos ticket file option. The global Kerberos ticket file is used for authentication when selecting a Hive endpoint, when testing the connection (using the Test Connection button), and when selecting which tables to replicate.
-
When replicating from a Hadoop source endpoint to a Hadoop target endpoint, both endpoints must be configured to use the same ticket file.
For additional steps required to complete setup for Kerberos ticket-based authentication, see Using Kerberos authentication.
If you selected Keytab, provide the following information:
-
Realm: The name of the realm in which your Hadoop cluster resides.
For example, if the full principal name is
john.doe@EXAMPLE.COM
, thenEXAMPLE.COM
is the realm. -
Principal: The user name to use for authentication. The principal must be a member of the realm entered above.
For example, if the full principal name is
john.doe@EXAMPLE.COM
, thenjohn.doe
is the principal. - Keytab file: The full path of the Keytab file. The Keytab file should contain the key of the Principal specified above.
Qlik Replicate Server on Windows:
When Qlik Replicate Server is running on Windows, select one of the following:
-
Use the following KDC: Select Active Directory (default) if your KDC is Microsoft Active Directory or select MIT if your KDC is MIT KDC running on Linux/UNIX.
Information noteWhen the Replicate KDC and the Hadoop KDC are in different domains, a relationship of trust must exist between the two domains.
- Realm: The name of the realm/domain in which your Hadoop cluster resides (where realm is the MIT term while domain is the Active Directory term).
- Principal: The username to use for authentication. The principal must be a member of the realm/domain entered above.
- When Active Directory is selected - Password: The password for the principal entered above.
- When MIT is selected - Keytab file: The keytab file containing the principal entered above.
Information noteWhen replicating from a Hadoop source endpoint to a Hadoop target endpoint, both endpoints must be configured to use the same parameters (KDC, realm, principal, and password).
If you are unsure about any of the above, consult your IT/security administrator.
For additional steps required to complete setup for Kerberos authentication, see Using Kerberos authentication.
-
-
User name and password - Select to connect to the Hadoop NameNode or to the Knox Gateway (when enabled - see below) with a user name and password. Then, in the User name and Password fields, specify the required user name and password.
Information noteConsider the following:
-
A user name and password is required to access the MapR Control System.
-
This information is case sensitive.
Information noteMake sure that the specified user has the required Hadoop access privileges. For information on how to provide the required privileges, see Required permissions.
-
-
-
If you need to access the Hortonworks Hadoop distribution through a Knox Gateway, select Use Knox Gateway.
Information noteTo be able to select this option, first select Use SSL and then select Password from the Authentication type drop-down list.
-
Provide values for the following fields:
- Knox Gateway host - The FQDN (Fully Qualified Domain Name) of the Knox Gateway host.
- Knox port - The port number to use to access the host. The default is "8443".
-
Knox Gateway path - The context path for the gateway. The default is "gateway".
Information noteThe port and path values are set in the gateway-site.xml file. If you are unsure whether the default values have been changed, contact your IT department.
- Cluster name - The cluster name as configured in Knox. The default is "default".
-
In the HDFS section, select either WebHDFS or HttpFS as the HDFS access method. If you are accessing MapR, it is recommended to use HttpFS.
Information noteWhen the Use Knox Gateway option is selected, the NameNode, HttpFS Host, and Port fields described below are not relevant (and are therefore hidden).
-
Do one of the following, depending on whether you selected WebHDFS or HttpFS:
If you selected WebHDFS:
-
In the NameNode field, specify the IP address of the NameNode.
Information noteThis is the Active node when High Availability is enabled (see below).
- Replicate supports replication from an HDFS High Availability cluster. In such a configuration, Replicate communicates with the Active node, but switches to the Standby node in the event of failover. To enable this feature, select the High Availability check box. Then, specify the FQDN (Fully Qualified Domain Name) of the Standby NameNode in the Standby NameNode field.
- In the Port field, optionally change the default port (50070).
If you selected HttpFS:
- In the HttpFS Host field, specify the IP address of the HttpFS host.
- In the Port field, optionally change the default port (14000).
-
-
In the Hive Access section, do the following:
Information noteWhen the Use Knox Gateway option is selected, the Host and Port fields described below are not relevant (and are therefore hidden).
- Access Hive using field (WebHCat): This value cannot be changed.
- Host field: Specify the IP address of the Hive machine.
- Port field: Optionally change the default port.
- Database field: Specify the name of the Hive target database.