Skip to main content

Setting general connection properties

This section describes how to configure general connection properties. For an explanation of how to configure advanced connection properties, see Setting advanced connection properties.

To add a Hortonworks Data Platform (HDP) target endpoint to Qlik Replicate:

  1. In the Qlik Replicate console, click Manage Endpoint Connections to open the Manage Endpoint Connections dialog box.

    For more information on adding an endpoint to Qlik Replicate, see Defining and managing endpoints.

  2. In the Name field, type a name for your endpoint. This can be any name that will help to identify the endpoint being used.
  3. In the Description field, type a description that helps to identify the HDP endpoint. This is optional.
  4. Select Hortonworks Data Platform (HDP) as the endpoint Type.
  5. In the Security section, do the following:

    Information note

    These settings are relevant for HDFS storage and Hive only.

    1. To encrypt the data between the Replicate machine and HDFS, select Use SSL. In order to use SSL, first make sure that the SSL prerequisites described in Prerequisites been met.

      In the CA path field, specify one of the following:

      • The full path of a CA certificate file (in PEM format).
      • The directory containing the certificate files with hash names.
    2. Select one of the following authentication types:

      • User name - Select to connect to the HDP cluster with only a user name. Then, in the User name field, specify the name of a user authorized to access the cluster.

      • User name and password - Select to connect to the HDP NameNode or to the Knox Gateway (when enabled) with a user name and password. Then, in the User name and Password fields, specify the required user name and password.
      • Kerberos - Select to authenticate against the HDP cluster using Kerberos. Replicate automatically detects whether Qlik Replicate Server is running on Linux or on Windows and displays the appropriate settings.

        Information note

        Note  In order to use Kerberos authentication on Linux, the Kerberos client (workstation) package should be installed.

        Qlik Replicate Server on Linux:

        When Qlik Replicate Server is running on Linux, select either Ticket or Keytab from the Kerberos options drop-down list.

        If you selected Ticket, select one of the following options:

        • Use global Kerberos ticket file - Select this option if you want to use the same ticket for several HDP endpoints. In this case, you must make sure to select this option for each HDP endpoint instance that you define.

        • Use specific Kerberos ticket file - Select this option if you want to use a different ticket file for each HDP endpoint. Then specify the ticket file name in the designated field.

          This option is especially useful if you need to perform a task-level audit of Replicate activity (using a third-party tool) on the HDP NameNode. To set this up, define several instances of the same HDP endpoint and specify a unique Kerberos ticket file for each instance. Then, for each task, simply select a different HDP endpoint instance.

        Information note

         

        • You need to define a global Kerberos ticket file even if you select the Use specific Kerberos ticket file option. The global Kerberos ticket file is used for authentication when selecting a Hive endpoint, when testing the connection (using the Test Connection button), and when selecting which tables to replicate.

        • When replicating from a Hadoop source endpoint to an HDP target endpoint, both endpoints must be configured to use the same ticket file.

        For additional steps required to complete setup for Kerberos ticket-based authentication, see Using Kerberos authentication.

        If you selected Keytab, provide the following information:

        • Realm: The name of the realm in which your HDP cluster resides.

          For example, if the full principal name is john.doe@EXAMPLE.COM, then EXAMPLE.COM is the realm.

        • Principal: The user name to use for authentication. The principal must be a member of the realm entered above.

          For example, if the full principal name is john.doe@EXAMPLE.COM, then john.doe is the principal.

        • Keytab file: The full path of the Keytab file. The Keytab file should contain the key of the Principal specified above.

        Qlik Replicate Server on Windows:

        When Qlik Replicate Server is running on Windows, select one of the following:

        • Use the following KDC: Select Active Directory (default) if your KDC is Microsoft Active Directory or select MIT if your KDC is MIT KDC running on Linux/UNIX.

          Information note

          When the Replicate KDC and the HDP KDC are in different domains, a relationship of trust must exist between the two domains.

        • Realm: The name of the realm/domain in which your HDP cluster resides (where realm is the MIT term while domain is the Active Directory term).
        • Principal: The user name to use for authentication. The principal must be a member of the realm/domain entered above.
        • When Active Directory is selected - Password: The password for the principal entered above.
        • When MIT is selected - Keytab file: The keytab file containing the principal entered above.
        Information note

        When replicating from a Hadoop source endpoint to an HDP target endpoint, both endpoints must be configured to use the same parameters (KDC, realm, principal, and password).

        If you are unsure about any of the above, consult your IT/security administrator.

        For additional steps required to complete setup for Kerberos authentication, see Using Kerberos authentication.

      • User name and password - Select to connect to the HDP NameNode or to the Knox Gateway (when enabled - see below) with a user name and password. Then, in the User name and Password fields, specify the required user name and password.

        Information note

        This information is case sensitive.

        Information note

        Make sure that the specified user has the required HDP access privileges. For information on how to provide the required privileges, see Security requirements.

  6. If you need to access the HDP distribution through a Knox Gateway, select Use Knox Gateway. Then provide values for the following fields:

    Information note

    To be able to select this option, first select Use SSL and then select Password from the Authentication type drop-down list.

    • Knox Gateway host - The FQDN (Fully Qualified Domain Name) of the Knox Gateway host.
    • Knox port - The port number to use to access the host. The default is "8443".
    • Knox Gateway path - The context path for the gateway. The default is "gateway".

      Information note

      The port and path values are set in the gateway-site.xml file. If you are unsure whether the default values have been changed, contact your IT department.

    • Cluster name - The cluster name as configured in Knox. The default is "default".
  7. In the Storage section, select the HDFS or Amazon S3 storage type.

    Configure the HDFS or Amazon S3 storage type settings depending on your selection as described in the following table.

    HDFS storage
    Option Description

    HDFS access method

    Choose one of the following:

    • WebHDFS
    • HttpFS
    Information note

    When the Use Knox Gateway option is selected, the NameNode, HttpFS Host, and Port fields described below are not relevant (and are therefore hidden).

    When WebHDFS is the selected access method: -

    NameNode

    Specify the IP address of the NameNode.

    Information note

    This is the Active node when High Availability is enabled (see below).

    High Availability

    Replicate supports replication to an HDFS High Availability cluster. In such a configuration,Replicate communicates with the Active node, but switches to the Standby node in the event of failover. To enable this feature, select the High Availability check box. Then, specify the FQDN (Fully Qualified Domain Name) of the Standby NameNode in the Standby NameNode field.

    Port

    Optionally, change the default port (50070).

    Target Folder

    Specify where to create the data files on HDFS.

    When HttpFS is the selected access method: -

    HttpFS Host

    Specify the IP address of the HttpFS host.

    Port

    Optionally, change the default port (14000).

    Target Folder

    Specify where to create the data files on HDFS.

    Amazon S3 storage

    Option Description

    Bucket name

    The name of your Amazon S3 bucket.

    Bucket region

    The region where your bucket is located. It is recommended to leave the default (Auto-Detect) as it usually eliminates the need to select a specific region. However, due to security considerations, for some regions (for example, AWS GovCloud) you might need to explicitly select the region. If the region you require does not appear in the regions list, select Other and set the code using the regionCode internal parameter in the endpoint’s Advanced tab.

    For a list of region codes, see the Region availability section in:

    https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html

    Access options

    Choose one of the following:

    When Key pair is the access option: -

    Access key

    Enter the access key information for Amazon S3.

    Secret key

    Enter the secret key information for Amazon S3.
    When Security Token Service (STS) is the access option: -
    ADFS URL The URL to an Active Directory Federation Services page, responsible for returning a SAML claims document to be sent over to AWS.
    AD principal name

    The principal (user) name to use when identifying against ADFS

    The format should be: user.name@domain

    AD principal password The principal password to use when identifying against ADFS
    IdP ARN The Amazon Resource Name (ARN) of the Active Directory issuing the SAML claims document. This is required as it enables AWS to identify the signer of the SAML document and verify its signature.
    SAML Role ARN The Amazon Resource Name (ARN) of the specific role the returned credentials should be assigned.

    Switch role after assuming SAML role

    Use this option to switch role after authentication.

    For more information, see:

    https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_aws-accounts.html

    When this option is selected, the following information is required:

    Role ARN: The ARN associated with the target role.

    Role external ID: The value of the external ID condition in the target role’s trust policy.

    For all access options: -

    Target folder

    Enter the target folder in your Amazon S3 bucket.
  8. In the Hive Access section, do the following:

    1. From the Access Hive using drop-down list, select one of the following options:

      Information note

      When the Use Knox Gateway option is selected, the Host and Port fields described below are not relevant (and are therefore hidden).

      • ODBC - Select this option to access Hive using an ODBC driver (the default). Then continue with the Host field.

        Information note

        If you select his option, make sure that the latest 64-bit ODBC driver for your Hadoop distribution is installed on the Qlik Replicate Server machine.

      • HQL scripts - When this option is selected, Replicate will generate HQL table creation scripts in the specified Script folder.

        Information note

        When this option is selected, the target storage format must be set to "Text".

      • No Access - When this option is selected, after the data files are created on HDFS, Replicate will take no further action.
    2. In the Host field, specify the IP address of the Hive machine.
    3. In the Port field, optionally change the default port.
    4. In the Database field, specify the name of the Hive target database.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!