Skip to main content Skip to complementary content

Exporting a Kerberos-secured Hive dataset to HDFS

To enable exports from to a Kerberos Cloudera environment for Hive datasets, you will have to edit the Spark Job Server configuration files.

Information noteImportant: Make sure that your keytab file used to authenticate to HDFS is accessible to all the workers on the cluster.

Procedure

  1. Create a <sjs_path>/jobserver_gss.conf file, and add the following configuration parameters:
    com.sun.security.jgss.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=false
    doNotPrompt=true
    useKeyTab=true
    keyTab="/path/to/the/keytab/keytab_file.keytab"
    principal="your@principalHere"
    debug=true;
    };
  2. In the <sjs_path>/manager_start.sh file, set these parameters with the following values to reference the previously created <sjs_path>/jobserver_gss.conf file:
    KRB5_OPTS="-Djava.security.auth.login.config=jobserver_gss.conf
     -Djava.security.krb5.debug=true
     -Djava.security.krb5.conf=/path/to/krb5.conf
     -Djavax.security.auth.useSubjectCredsOnly=false"
     --conf "spark.executor.extraJavaOptions=$LOGGING_OPTS $KRB5_OPTS"
     --conf "spark.yarn.dist.files=/path/to/jobserver_gss.conf"
     --proxy-user $4
     --driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES $JDBC_PROPERTIES $KRB5_OPTS"
  3. When importing your dataset in Talend Data Preparation, the JDBC URL used to connect to Hive must follow this model:
    jdbc:hive2://host:10000/default;principal=<your_principal>
  4. Copy the <components_catalog_path>/config/jdbc_config.json file that contains the Hive driver to the Spark Job Server installation folder.
  5. Copy the .jar files from the <components_catalog_path>/.m2 folder to the <sjs_path>/datastreams-deps folder.

Results

You can now export your Hive datasets to HDFS.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!