Setting general connection properties
This section describes how to configure general connection properties. For an explanation of how to configure advanced connection properties, see Setting advanced connection properties.
To add a Hortonworks Data Platform (HDP) target endpoint to Qlik Replicate:
-
In the Qlik Replicate console, click Manage Endpoint Connections to open the Manage Endpoint Connections dialog box.
For more information on adding an endpoint to Qlik Replicate, see Defining and managing endpoints.
- In the Name field, type a name for your endpoint. This can be any name that will help to identify the endpoint being used.
- In the Description field, type a description that helps to identify the HDP endpoint. This is optional.
- Select Hortonworks Data Platform (HDP) as the endpoint Type.
-
In the Security section, do the following:
Information noteThese settings are relevant for HDFS storage and Hive only.
-
To encrypt the data between the Replicate machine and HDFS, select Use SSL. In order to use SSL, first make sure that the SSL prerequisites described in Prerequisites been met.
In the CA path field, specify one of the following:
- The full path of a CA certificate file (in PEM format).
- The directory containing the certificate files with hash names.
-
Select one of the following authentication types:
-
User name - Select to connect to the HDP cluster with only a user name. Then, in the User name field, specify the name of a user authorized to access the cluster.
- User name and password - Select to connect to the HDP NameNode or to the Knox Gateway (when enabled) with a user name and password. Then, in the User name and Password fields, specify the required user name and password.
-
Kerberos - Select to authenticate against the HDP cluster using Kerberos. Replicate automatically detects whether Qlik Replicate Server is running on Linux or on Windows and displays the appropriate settings.
Information noteNote In order to use Kerberos authentication on Linux, the Kerberos client (workstation) package should be installed.
Qlik Replicate Server on Linux:
When Qlik Replicate Server is running on Linux, select either Ticket or Keytab from the Kerberos options drop-down list.
If you selected Ticket, select one of the following options:
-
Use global Kerberos ticket file - Select this option if you want to use the same ticket for several HDP endpoints. In this case, you must make sure to select this option for each HDP endpoint instance that you define.
-
Use specific Kerberos ticket file - Select this option if you want to use a different ticket file for each HDP endpoint. Then specify the ticket file name in the designated field.
This option is especially useful if you need to perform a task-level audit of Replicate activity (using a third-party tool) on the HDP NameNode. To set this up, define several instances of the same HDP endpoint and specify a unique Kerberos ticket file for each instance. Then, for each task, simply select a different HDP endpoint instance.
Information note-
You need to define a global Kerberos ticket file even if you select the Use specific Kerberos ticket file option. The global Kerberos ticket file is used for authentication when selecting a Hive endpoint, when testing the connection (using the Test Connection button), and when selecting which tables to replicate.
-
When replicating from a Hadoop source endpoint to an HDP target endpoint, both endpoints must be configured to use the same ticket file.
For additional steps required to complete setup for Kerberos ticket-based authentication, see Using Kerberos authentication.
If you selected Keytab, provide the following information:
-
Realm: The name of the realm in which your HDP cluster resides.
For example, if the full principal name is
john.doe@EXAMPLE.COM
, thenEXAMPLE.COM
is the realm. -
Principal: The user name to use for authentication. The principal must be a member of the realm entered above.
For example, if the full principal name is
john.doe@EXAMPLE.COM
, thenjohn.doe
is the principal. - Keytab file: The full path of the Keytab file. The Keytab file should contain the key of the Principal specified above.
Qlik Replicate Server on Windows:
When Qlik Replicate Server is running on Windows, select one of the following:
-
Use the following KDC: Select Active Directory (default) if your KDC is Microsoft Active Directory or select MIT if your KDC is MIT KDC running on Linux/UNIX.
Information noteWhen the Replicate KDC and the HDP KDC are in different domains, a relationship of trust must exist between the two domains.
- Realm: The name of the realm/domain in which your HDP cluster resides (where realm is the MIT term while domain is the Active Directory term).
- Principal: The user name to use for authentication. The principal must be a member of the realm/domain entered above.
- When Active Directory is selected - Password: The password for the principal entered above.
- When MIT is selected - Keytab file: The keytab file containing the principal entered above.
Information noteWhen replicating from a Hadoop source endpoint to an HDP target endpoint, both endpoints must be configured to use the same parameters (KDC, realm, principal, and password).
If you are unsure about any of the above, consult your IT/security administrator.
For additional steps required to complete setup for Kerberos authentication, see Using Kerberos authentication.
-
-
User name and password - Select to connect to the HDP NameNode or to the Knox Gateway (when enabled - see below) with a user name and password. Then, in the User name and Password fields, specify the required user name and password.
Information noteThis information is case sensitive.
Information noteMake sure that the specified user has the required HDP access privileges. For information on how to provide the required privileges, see Security requirements.
-
-
-
If you need to access the HDP distribution through a Knox Gateway, select Use Knox Gateway. Then provide values for the following fields:
Information noteTo be able to select this option, first select Use SSL and then select Password from the Authentication type drop-down list.
- Knox Gateway host - The FQDN (Fully Qualified Domain Name) of the Knox Gateway host.
- Knox port - The port number to use to access the host. The default is "8443".
-
Knox Gateway path - The context path for the gateway. The default is "gateway".
Information noteThe port and path values are set in the gateway-site.xml file. If you are unsure whether the default values have been changed, contact your IT department.
- Cluster name - The cluster name as configured in Knox. The default is "default".
-
In the Storage section, select the HDFS or Amazon S3 storage type.
Configure the HDFS or Amazon S3 storage type settings depending on your selection as described in the following table.
HDFS storage Option Description HDFS access method
Choose one of the following:
- WebHDFS
- HttpFS
Information noteWhen the Use Knox Gateway option is selected, the NameNode, HttpFS Host, and Port fields described below are not relevant (and are therefore hidden).
When WebHDFS is the selected access method: - NameNode
Specify the IP address of the NameNode.
Information noteThis is the Active node when High Availability is enabled (see below).
High Availability
Replicate supports replication to an HDFS High Availability cluster. In such a configuration,Replicate communicates with the Active node, but switches to the Standby node in the event of failover. To enable this feature, select the High Availability check box. Then, specify the FQDN (Fully Qualified Domain Name) of the Standby NameNode in the Standby NameNode field.
Port
Optionally, change the default port (50070).
Target Folder
Specify where to create the data files on HDFS.
When HttpFS is the selected access method: - HttpFS Host
Specify the IP address of the HttpFS host.
Port
Optionally, change the default port (14000).
Target Folder
Specify where to create the data files on HDFS.
Amazon S3 storage
Option Description Bucket name
The name of your Amazon S3 bucket. Bucket region
The region where your bucket is located. It is recommended to leave the default (Auto-Detect) as it usually eliminates the need to select a specific region. However, due to security considerations, for some regions (for example, AWS GovCloud) you might need to explicitly select the region. If the region you require does not appear in the regions list, select Other and set the code using the regionCode internal parameter in the endpoint’s Advanced tab.
For a list of region codes, see the Region availability section in:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html
Access options
Choose one of the following:
-
Key pair
Choose this method to authenticate with your Access Key and Secret Key.
-
IAM Roles for EC2.
Choose this method if the machine on which Qlik Replicate is installed is configured to authenticate itself using an IAM role.
For more information about this access option, see:
http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html
-
Security Token Service (STS)
Choose this method to authenticate using SAML 2.0 with Active Directory Federation Services.
For more information about this access option, see:
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_providers_saml.html
When Key pair is the access option: - Access key
Enter the access key information for Amazon S3. Secret key
Enter the secret key information for Amazon S3. When Security Token Service (STS) is the access option: - ADFS URL The URL to an Active Directory Federation Services page, responsible for returning a SAML claims document to be sent over to AWS. AD principal name The principal (user) name to use when identifying against ADFS
The format should be:
user.name@domain
AD principal password The principal password to use when identifying against ADFS IdP ARN The Amazon Resource Name (ARN) of the Active Directory issuing the SAML claims document. This is required as it enables AWS to identify the signer of the SAML document and verify its signature. SAML Role ARN The Amazon Resource Name (ARN) of the specific role the returned credentials should be assigned. Switch role after assuming SAML role
Use this option to switch role after authentication.
For more information, see:
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_aws-accounts.html
When this option is selected, the following information is required:
Role ARN: The ARN associated with the target role.
Role external ID: The value of the external ID condition in the target role’s trust policy.
For all access options: - Enter the target folder in your Amazon S3 bucket. -
In the Hive Access section, do the following:
-
From the Access Hive using drop-down list, select one of the following options:
Information noteWhen the Use Knox Gateway option is selected, the Host and Port fields described below are not relevant (and are therefore hidden).
-
ODBC - Select this option to access Hive using an ODBC driver (the default). Then continue with the Host field.
Information noteIf you select his option, make sure that the latest 64-bit ODBC driver for your Hadoop distribution is installed on the Qlik Replicate Server machine.
-
HQL scripts - When this option is selected, Replicate will generate HQL table creation scripts in the specified Script folder.
Information noteWhen this option is selected, the target storage format must be set to "Text".
- No Access - When this option is selected, after the data files are created on HDFS, Replicate will take no further action.
-
- In the Host field, specify the IP address of the Hive machine.
- In the Port field, optionally change the default port.
- In the Database field, specify the name of the Hive target database.
-