Defining the Spark connection in a Job script

Use the addElementParameters{} function in the addParameters{} function to define the Spark connection in a Job script.

addElementParameters {} properties

Properties relevant to selecting the Spark cluster to be used:

Function/parameter	Description	Mandatory?
SPARK_LOCAL_MODE	Enter `"true"` to run your Spark Job in the local mode. By default, the value is `"false"`, which means to use a remote cluster. In the local mode, the Studio builds the Spark environment in itself on the fly in order to run the Job in. Each processor of the local machine is used as a Spark worker to perform the computations. In this mode, your local file system is used; therefore, deactivate the configuration components such as tS3Configuration or tHDFSConfiguration that provides connection information to a remote file system, if you have placed these components in your Job. You can launch your Job without any further configuration.	Yes
SPARK_LOCAL_VERSION	Enter the Spark version to be used in the local mode. This property is relevant only when you have entered `"true"` for SPARK_LOCAL_MODE. The Studio does not support Spark versions below 2.0 in the local mode. For example, enter the value `"SPARK_2_1_0"`.	Yes when Spark local mode is used.
DISTRIBUTION	Enter the name of the provider of your distribution. Depending on your distribution, enter one of the following values: `"CLOUDERA"` `"CLOUDERA_ALTUS"` `"GOOGLE_CLOUD_DATAPROC"` `"HORTONWORKS"` `"MAPR"` `"MICROSOFT_HD_INSIGHT"`	Yes when you are using neither the Spark local mode nor the Amazon EMR distribution.
SPARK_VERSION	Enter the version of your distribution. The following list provides example formats for each available distribution: `"Cloudera_CDH5_12"` `"Cloudera_Altus_CDH5_11"` `"DATAPROC_1_1"` `"HDP_2_6"` `"MAPR600"` `"MICROSOFT_HD_INSIGHT_3_6"` `"EMR_5_5_0"` For more information about the distribution versions supported by Talend, see the section called Supported Big Data platform distribution versions for Talend Job in Talend Installation Guide.	Yes when you are not using Spark local mode.
SUPPORTED_SPARK_VERSION	Enter the Spark version used by your distribution. For example, `"SPARK_2_1_0"`.	Yes when you are not using Spark local mode.
SPARK_API_VERSION	Enter `"SPARK_200"`, the Spark API version used by Talend.	Yes.
SET_HDP_VERSION	Enter `"true"` if your Hortonworks cluster is using the hdp.version variable to store its version; otherwise, enter `"false"`. Contact the administrator of your cluster if you are not sure about this information.	Yes when you are using Hortonworks.
HDP_VERSION	Enter Hortonwork version to be used, for example, `"\"2.6.0.3-8\""`. Contact the administrator of your cluster if you are not sure about this information. You must add the version number to the yarn-site.xml file of your cluster, too. In this example, add `hdp.version=2.6.0.3-8`.	Yes when you have entered `"true"` for SET_HDP_VERSION.
SPARK_MODE	Enter the mode your Spark cluster has been implemented. Depending on your situation, enter one of the following values: `"CLUSTER"`: means to run in the Spark Standalone mode. `"YARN_CLIENT"`	Yes when you are not using the Spark local mode.

Properties relevant to configuring the connection to Spark:

Function/parameter	Description	Mandatory?
RESOURCE_MANAGER	Enter the address of the ResourceManager service of the Hadoop cluster to be used.	Yes when you are using the Yarn client mode.
SET_SCHEDULER_ADDRESS	Enter `"true"` if your cluster possesses a ResourceManager scheduler; otherwise, enter `"false"`.	Yes when you are using the Yarn client mode.
RESOURCEMANAGER_SCHEDULER_ADDRESS	Enter the Scheduler address.	Yes when you have entered `"true"` for SET_SCHEDULER_ADDRESS.
SET_JOBHISTORY_ADDRESS	Enter `"true"` if your cluster possesses a JobHistory service; otherwise, enter `"false"`.	Yes when you are using the Yarn client mode.
JOBHISTORY_ADDRESS	Enter the location of the JobHistory server of the Hadoop cluster to be used. This allows the metrics information of the current Job to be stored in that JobHistory server.	Yes when you have entered `"true"` for SET_JOBHISTORY_ADDRESS.
SET_STAGING_DIRECTORY	Enter `"true"` if your cluster possesses a staging directory to store the temporary files created by running programs; otherwise, enter `"false"`.	Yes when you are using the Yarn client mode.
STAGING_DIRECTORY	Enter this directory, for example, `"\"/user\""`. Typically, this directory can be found under the yarn.app.mapreduce.am.staging-dir property in the configuration files such as yarn-site.xml or mapred-site.xml of your distribution.	Yes when you have entered `"true"` for SET_STAGING_DIRECTORY.
HDINSIGHT_ENDPOINT	Enter the endpoint of your HDInsight cluster. For example, `"\"https://mycluster.azurehdinsight.net\""`.	Yes when you are using the related distribution.
HDINSIGHT_USERNAME and HDINSIGHT_PASSWORD	The Username is the one defined when creating your cluster. You can find it in the SSH + Cluster login blade of your cluster. The Password is defined when creating your HDInsight cluster for authentication to this cluster. For example, `"\"talendstorage\""` as username and `"my_password"` as password.	Yes when you are using the related distribution.
LIVY_HOST	The Hostname of Livy is the URL of your HDInsight cluster. This URL can be found in the Overview blade of your cluster. Enter this URL without the https:// part. The default Port is 443. The Username is the one defined when creating your cluster. You can find it in the SSH + Cluster login blade of your cluster. For further information about the Livy service used by HD Insight, see Submit Spark jobs using Livy.	Yes when you are using the related distribution, HDInsight.
LIVY_PORT	Enter the port number of your Livy service. By default, the port number is `"\"443\""`.	Yes when you are using the related distribution, HDInsight.
LIVY_USERNAME	Enter your HDinsight username, for example, `"\"my_hdinsight_account\""`.	Yes when you are using the related distribution, HDInsight.
HDINSIGHT_POLLING_INTERVAL_DURATION	Enter the time interval (in milliseconds) at the end of which you want the Studio to ask Spark for the status of your Job. By default, the time interval is `30000`, therefore 30 seconds.	No. If you don't specify this parameter, the default value is used with the related distribution, HDInsight.
HDINSIGHT_MAX_MISSING_STATUS	Enter the maximum number of times the Studio should retry to get a status when there is no status response. By default, the number of retries is `10`.	No. If you don't specify this parameter, the default value is used with the related distribution, HDInsight.
WASB_HOST	Enter the address of your Windows Azure Storage blob, for example, `"\"https://my_storage_account_name.blob.core.windows.net\""`.	Yes when you are using the related distribution, HDInsight.
WASB_CONTAINER	Enter the name of the container to be used, for example, `"\"talend_container\""`.	Yes when you are using the related distribution, HDInsight.
REMOTE_FOLDER	Enter the location in which you want to store the current Job and its dependent libraries in this Azure Storage account, for example, "\"/user/ychen/deployment_blob\"".	Yes when you are using the related distribution, HDInsight.
SPARK_HOST	Enter the URI of the Spark Master of the Hadoop cluster to be used, for example, `"\"spark://localhost:7077\""`.	Yes when you are using the Spark Standalone mode.
SPARK_HOME	Enter the location of the Spark executable installed in the Hadoop cluster to be used, for example, `"\"/usr/lib/spark\""`.	Yes when you are using the Spark Standalone mode.
DEFINE_HADOOP_HOME_DIR	If you need to launch from Windows, it is recommended to specify where the winutils.exe program to be used is stored. If you know where to find your winutils.exe file and you want to use it, enter `"true"`; otherwise, enter `"false"`.	Yes when you are using a distribution that is not running on cloud.
HADOOP_HOME_DIR	Enter the directory where your winutils.exe is stored, for example, `"\"C:/Talend/winutils\""`.	Yes when you have entered `"true"` for DEFINE_HADOOP_HOME_DIR.
DEFINE_SPARK_DRIVER_HOST	In the Yarn client mode of Spark, if the Spark cluster cannot recognize by itself the machine in which the Job is launched, enter `"true"`; otherwise, enter `"false"`.	Yes when you are using a distribution that is not running on cloud and the Spark mode is Yarn client.
SPARK_DRIVER_HOST	Enter the host name or the IP address of this machine, for example, `"\"127.0.0.1\""`. This allows the Spark master and its workers to recognize this machine to find the Job and thus its driver. Note that in this situation, you also need to add the name and the IP address of this machine to its host file.	Yes when you have entered `"true"` for DEFINE_SPARK_DRIVER_HOST.
GOOGLE_PROJECT_ID	Enter the ID of your Google Cloud Platform project. For example, `"\"my-google-project\""`.	Yes when you are using the related distribution.
GOOGLE_CLUSTER_ID	Enter the ID of your Dataproc cluster to be used. For example, `"\"my-cluster-id\""`.	Yes when you are using the related distribution.
GOOGLE_REGION	Enter the geographic zones in which the computing resources are used and your data is stored and processed. If you do not need to specify a particular region, enter `"\"global\""`.	Yes when you are using the related distribution.
GOOGLE_JARS_BUCKET	As a Talend Job expects its dependent jar files for execution, specify the Google Storage directory to which these jar files are transferred so that your Job can access these files at execution. The directory to be entered must end with a slash (/). If not existing, the directory is created on the fly but the bucket to be used must already exist. For example, `"\"gs://my-bucket/talend/jars/\""`.	Yes when you are using the related distribution.
DEFINE_PATH_TO_GOOGLE_CREDENTIALS	When you launch your Job from a given machine in which Google Cloud SDK has been installed and authorized to use your user account credentials to access Google Cloud Platform, enter `"false"`. In this situation, this machine is often your local machine. When you launch your Job from a remote machine, such as a Jobserver, enter `"true"`.	Yes when you are using the related distribution.
PATH_TO_GOOGLE_CREDENTIALS	Enter the directory in which this JSON file is stored in the remote machine. Very often, it is the Jobserver. For example, `"\"/user/ychen/my_credentials.json\""`.	Yes when you have entered `"true"` for DEFINE_PATH_TO_GOOGLE_CREDENTIALS.
ALTUS_SET_CREDENTIALS	If you want to provide the Altus credentials with your Job, enter `"true"`. If you want to provide the Altus credentials separately, for example manually using the command altus configure in your terminal, enter `"false"`.	Yes when you are using the related distribution.
ALTUS_ACCESS_KEY and ALTUS_SECRET_KEY	Enter your Altus access key and the directory pointing to your Altus secret key file. For example, `"\"my_access_key\""` and `"\"/user/ychen/my_secret_key_file`.	Yes when you have entered `"true"` for ALTUS_SET_CREDENTIALS.
ALTUS_CLI_PATH	Enter the path to the Cloudera Altus client, which must have been installed and activated in the machine in which your Job is executed. In production environments, this machine is typically a Talend Jobserver. For example, `"\"/opt/altuscli/altusclienv/bin/altus\""`.	Yes when you are using the related distribution.
ALTUS_REUSE_CLUSTER	Enter `"true"` to use a Cloudera Altus cluster already existing in your Cloud service. Otherwise, enter `"false"` to allow the Job to create a cluster on the fly.	Yes when you are using the related distribution.
ALTUS_CLUSTER_NAME	Enter the name of the cluster to be used. For example, `"\"talend-altus-cluster\""`.	Yes when you are using the related distribution.
ALTUS_ENVIRONMENT_NAME	Enter the name of the Cloudera Altus environment to be used to describe the resources allocated to the given cluster. For example, `"\"talend-altus-cluster\""`.	Yes when you are using the related distribution.
ALTUS_CLOUD_PROVIDER	Enter the Cloud service that runs your Cloudera Altus cluster. Currently, only AWS is supported. So enter `"\"AWS\""`.	Yes when you are using the related distribution.
ALTUS_DELETE_AFTER_EXECUTION	Enter `"true"` if you want to remove the given cluster after the execution of your Job. Otherwise, enter `"false"`.	Yes when you are using the related distribution.
ALTUS_S3_ACCESS_KEY and ALTUS_S3_SECRET_KEY	Enter the authentication information required to connect to the Amazon S3 bucket to be used.	Yes when you have entered `"\"AWS\""` for ALTUS_CLOUD_PROVIDER.
ALTUS_S3_REGION	Enter the AWS region to be used. For example `"\"us-east-1\""`.	Yes when you have entered `"\"AWS\""` for ALTUS_CLOUD_PROVIDER.
ALTUS_BUCKET_NAME	Enter the name of the bucket to be used to store the dependencies of your Job. This bucket must already exist. For example `"\"my-bucket\""`.	Yes when you have entered `"\"AWS\""` for ALTUS_CLOUD_PROVIDER.
ALTUS_JARS_BUCKET	Enter the directory in which you want to store the dependencies of your Job in this given bucket, for example, `"\"altus/jobjar\""`. This directory is created if it does not exist at runtime.	Yes when you have entered `"\"AWS\""` for ALTUS_CLOUD_PROVIDER.
ALTUS_USE_CUSTOM_JSON	Enter `"true` if you need to manually edit JSON code to configure your Altus cluster. Otherwise, enter `"false"`.	Yes when you are using the related distribution.
ALTUS_CUSTOM_JSON	Enter your custom json code, for example, `"{my_json_code}"`.	Yes when you have entered `"true` for ALTUS_USE_CUSTOM_JSON.
ALTUS_INSTANCE_TYPE	Enter the instance type for the instances in the cluster. All nodes that are deployed in this cluster use the same instance type. For example, `"\"c4.2xlarge\""`.	Yes when you are using the related distribution.
ALTUS_WORKER_NODE	Enter the number of worker nodes to be created for the cluster. For example, `"\"10\""`.	Yes when you are using the related distribution.
ALTUS_CLOUDERA_MANAGER_USERNAME	Enter the authentication information to your Cloudera Manager service. For example, `"\"altus\""`.	Yes when you are using the related distribution.
SPARK_SCRATCH_DIR	Enter the directory to stores in the local system the temporary files such as the Job dependencies to be transferred, for example, `"\"/tmp\""`.	Yes.
STREAMING_BATCH_SIZE	Enter the time interval (ms) at the end of which the Job reviews the source data to identify changes and processes the new micro batches, for example, `"1000"`.	Yes when you are developing a Spark Streaming Job.
DEFINE_DURATION	If you need to define a streaming timeout (ms), enter `"true"`. Otherwise, enter `"false"`.	Yes when you are developing a Spark Streaming Job.
STREAMING_DURATION	Enter the time frame (ms) at the end of which the streaming Job automatically stops running, for example, `"10000"`.	Yes when you have entered `"true` for DEFINE_DURATION.
SPARK_ADVANCED_PROPERTIES	Enter the code to use other Hadoop or Spark related properties. For example: `{ PROPERTY : "\"spark.yarn.am.extraJavaOptions\"", VALUE : "\"-Dhdp.version=2.4.0.0-169\"", BUILDIN : "TRUE" }`	No.

Properties relevant to defining the security configuration:

Function/parameter	Description	Mandatory?
USE_KRB	Enter `"true"` if the cluster to be used is secured with Kerberos. Otherwise, enter `"false"`.	Yes
RESOURCEMANAGER_PRINCIPAL	Enter the Kerberos principal names for the ResourceManager service, for example, `"\"yarn/_HOST@EXAMPLE.COM\""`.	Yes when you are using Kerberos and the Yarn client mode.
JOBHISTORY_PRINCIPAL	Enter the Kerberos principal names for the JobHistory service, for example, `"\"mapred/_HOST@EXAMPLE.COM\""`.	Yes when you are using Kerberos and the Yarn client mode.
USE_KEYTAB	If you need to use a Kerberos keytab file to log in, enter `"true"`. Otherwise, enter `"false"`.	Yes when you are using Kerberos.
PRINCIPAL	Enter the principal to be used, for example, `"\"hdfs\""`.	Yes when you are using a Kerberos keytab file.
KEYTAB_PATH	Enter the access path to the keytab file itself. This keytab file must be stored in the machine in which your Job actually runs, for example, on a Talend Jobserver. For example, `"\"/tmp/hdfs.headless.keytab\""`.	Yes when you are using a Kerberos keytab file.
USERNAME	Enter the login user name for your distribution. If you leave it empty, that is to say `"\"\""`, the user name of the machine in which your Job actually runs will be used.	Yes when you are not using Kerberos.
USE_MAPRTICKET	If the MapR cluster to be used is secured with the MapR ticket authentication mechanism, enter `"true"`. Otherwise, enter `"false"`.	Yes when you are using a MapR cluster.
MAPRTICKET_PASSWORD	Enter the password to be used to log into MapR, for example, `"my_password"`.	Yes when you are not using Kerberos but are using MapR ticket authentication mechanism.
MAPRTICKET_CLUSTER	Enter the name of the MapR cluster you want to connect to, for example, `"\"demo.mapr.com\""`.	Yes when you are using MapR ticket authentication mechanism.
MAPRTICKET_DURATION	Enter the length of time (in seconds) during which the ticket is valid, for example, `"86400L"`.	Yes when you are using MapR ticket authentication mechanism.
SET_MAPR_HOME_DIR	If the location of the MapR configuration files has been changed to somewhere else in the cluster, that is to say, the MapR Home directory has been changed, enter `"true"`. Otherwise, enter `"false"`.	Yes when you are using MapR ticket authentication mechanism.
MAPR_HOME_DIR	Enter the new Home directory, for example, `"\"/opt/mapr/custom/\""`.	Yes when you have entered `"true` for SET_MAPR_HOME_DIR.
SET_HADOOP_LOGIN	If the login module to be used has been changed in the MapR security configuration file, mapr.login.conf, enter `"true"`. Otherwise, enter `"false"`.	Yes when you are using MapR ticket authentication mechanism.
HADOOP_LOGIN	Enter the module to be called from the mapr.login.conf file, for example, `"\"kerberos\""` means to call the hadoop_kerberos module.	Yes when you have entered `"true` for SET_HADOOP_LOGIN.

Properties relevant to tuning Spark:

Function/parameter	Description	Mandatory?
ADVANCED_SETTINGS_CHECK	Enter `"true"` if you need to optimize the allocation of the resources to be used to run your Jobs. Otherwise, enter `"false"`.	Yes.
SPARK_DRIVER_MEM and SPARK_DRIVER_CORES	Enter the allocation size of memory and the number of cores to be used by the driver of the current Job, for example, `"\"512m\"",` for memory and `"\"1\""` for the number of cores.	Yes when you are tuning Spark in the Standalone mode.
SPARK_YARN_AM_SETTINGS_CHECK	Enter `"true"` to define the ApplicationMaster tuning properties of your Yarn cluster. Otherwise, enter `"false"`.	Yes when you are tuning Spark in the Yarn client mode.
SPARK_YARN_AM_MEM and SPARK_YARN_AM_CORES	Enter the allocation size of memory to be used by the ApplicationMaster, for example, `"\"512m\"",` for memory and `"\"1\""` for the number of cores.	Yes when you have entered `"true"` for SPARK_YARN_AM_SETTINGS_CHECK.
SPARK_EXECUTOR_MEM	Enter the allocation size of memory to be used by each Spark executor, for example, `"\"512m\""`.	Yes when you are tuning Spark.
SET_SPARK_EXECUTOR_MEM_OVERHEAD	Enter `"true"` if you need to allocate the amount of off-heap memory (in MB) per executor. Otherwise, enter `"false"`.	Yes when you are tuning Spark in the Yarn client mode.
SPARK_EXECUTOR_MEM_OVERHEAD	Enter the amount of off-heap memory (in MB) to be allocated per executor.	Yes when you have entered `"true"` for SET_SPARK_EXECUTOR_MEM_OVERHEAD.
SPARK_EXECUTOR_CORES_CHECK	If you need to define the number of cores to be used by each executor, enter `"true"`. Otherwise, enter `"false"`.	Yes when you are tuning Spark.
SPARK_EXECUTOR_CORES	Enter the number of cores to be used by each executor, for example, `"\"1\""`.	Yes when you have entered `"true"` for SPARK_EXECUTOR_CORES_CHECK.
SPARK_YARN_ALLOC_TYPE	Enter how you want Yarn to allocate resources among executors. Enter one of the following values: `"AUTO"`: means to let Yarn use its default number of executors. This number is 2. `"FIXED"`: means to define the number of executors to be used with SPARK_EXECUTOR_INSTANCES. `"DYNAMIC"`: means to let Yarn adapt the number of executors to suit the workload. Then you need to define SPARK_YARN_DYN_INIT, SPARK_YARN_DYN_MIN and SPARK_YARN_DYN_MAX.	Yes when you are tuning Spark in the Yarn client mode.
SPARK_EXECUTOR_INSTANCES	Enter the number of executors to be used by Yarn, for example, `"\"2\""`.	Yes when you have entered `"FIXED"` for SPARK_YARN_ALLOC_TYPE.
SPARK_YARN_DYN_INIT, SPARK_YARN_DYN_MIN and SPARK_YARN_DYN_MAX	Define the scale of the dynamic allocation by defining these three properties. For example, `"\"1\""` as the number of initial executor, `"\"0\""` as the minimum number and `"\"MAX\""` as the maximum number.	Yes when you have entered `"DYNAMIC"` for SPARK_YARN_ALLOC_TYPE.
WEB_UI_PORT_CHECK	If you need to change the default port of the Spark Web UI, enter `"true"`. Otherwise, enter `"false"`.	Yes when you are tuning Spark.
WEB_UI_PORT	Enter the port number you want to use for the Spark Web UI, for example, `"\"4040\""`.	Yes when you have entered `"true"` for WEB_UI_PORT_CHECK.
SPARK_BROADCAST_FACTORY	Enter the broadcast implementation to be used to cache variables on each worker machine. Enter one of the following values: `"AUTO"` `"TORRENT"` `"HTTP"`	Yes when you are tuning Spark.
CUSTOMIZE_SPARK_SERIALIZER	If you need to import an external Spark serializer, enter `"true"`. Otherwise, enter `"false"`.	Yes when you are tuning Spark.
SPARK_SERIALIZER	Enter the fully qualified class name of the serializer to be used, for example, `"\"org.apache.spark.serializer.KryoSerializer\""`.	Yes when you have entered `"true"` for CUSTOMIZE_SPARK_SERIALIZER.
ENABLE_BACKPRESSURE	If you need to enable the backpressure feature of Spark, enter `"true"`. Otherwise, enter `"false"`. The backpressure feature is available in the Spark verson 1.5 and onwards. With backpress enabled, Spark automatically finds the optimal receiving rate and dynamically adapts the rate based on current batch scheduling delays and processing time, in order to receive data only as fast as it can process.	Yes when you are tuning Spark for a Spark Streaming Job.

Properties relevant to logging the execution of your Jobs:

Function/parameter	Description	Mandatory?
ENABLE_SPARK_EVENT_LOGGING	Enter `"true"` if you need to enable the Spark application logs of this Job to be persistent in the file system of your Yarn cluster. Otherwise, enter `"false"`.	Yes when you are using Spark in the Yarn client mode.
COMPRESS_SPARK_EVENT_LOGS	If you need to compress the logs, enter `"true"`. Otherwise, enter `"false"`.	Yes when you have entered `"true"` for ENABLE_SPARK_EVENT_LOGGING.
SPARK_EVENT_LOG_DIR	Enter the directory in which Spark events are logged, for example, `"\"hdfs://namenode:8020/user/spark/applicationHistory\""`.	Yes when you have entered `"true"` for ENABLE_SPARK_EVENT_LOGGING.
SPARKHISTORY_ADDRESS	Enter the location of the history server, for example, `"\"sparkHistoryServer:18080\""`.	Yes when you have entered `"true"` for ENABLE_SPARK_EVENT_LOGGING.
USE_CHECKPOINT	If you need the Job to be resilient to failure, enter `"true"` to enable the Spark checkpointing operation. Otherwise, enter `"false"`.	Yes.
CHECKPOINT_DIR	Enter the directory in which Spark stores, in the file system of the cluster, the context data of the computations such as the metadata and the generated RDDs of this computation. For example, `"\"file:///tmp/mycheckpoint\""`.	Yes when you have entered `"true"` for SET_SPARK_EXECUTOR_MEM_OVERHEAD.

Properties relevant to configuring Cloudera Navigator:

If you are using Cloudera V5.5+ to run your Apache Spark Batch Jobs, you can make use of Cloudera Navigator to trace the lineage of given data flow to discover how this data flow was generated by a Job.

Function/parameter	Description	Mandatory?
USE_CLOUDERA_NAVIGATOR	Enter `"true"` if you want to use Cloudera Navigator. Otherwise, enter `"false"`.	Yes when you are using Spark on Cloudera.
CLOUDERA_NAVIGATOR_USERNAME and CLOUDERA_NAVIGATOR_PASSWORD	Enter the credentials you use to connect to your Cloudera Navigator. For example, `"\"username\""` as username and `"password"` as password.	Yes when you have entered `"true"` for USE_CLOUDERA_NAVIGATOR.
CLOUDERA_NAVIGATOR_URL	Enter the location of the Cloudera Navigator to connect to, for example, `"\"http://localhost:7187/api/v8/\""`.	Yes when you have entered `"true"` for USE_CLOUDERA_NAVIGATOR.
CLOUDERA_NAVIGATOR_METADATA_URL	Enter the location of the Navigator Metadata, for example, `"\"http://localhost:7187/api/v8/metadata/plugin\""`.	Yes when you have entered `"true"` for USE_CLOUDERA_NAVIGATOR.
CLOUDERA_NAVIGATOR_CLIENT_URL	Enter the location of the Navigator client, for example, `"\"http://localhost\""`.	Yes when you have entered `"true"` for USE_CLOUDERA_NAVIGATOR.
CLOUDERA_NAVIGATOR_AUTOCOMMIT	If you want to make Cloudera Navigator generate the lineage of the current Job at the end of the execution of your Job, enter `"true"`. Otherwise, enter `"false"`.	Yes when you have entered `"true"` for USE_CLOUDERA_NAVIGATOR.
CLOUDERA_NAVIGATOR_DISABLE_SSL_VALIDATION	If you do not want to use the SSL validation process when your Job connects to Cloudera Navigator, enter `"true"`. Otherwise, enter `"false"`.	Yes when you have entered `"true"` for USE_CLOUDERA_NAVIGATOR.
CLOUDERA_NAVIGATOR_DIE_ON_ERROR	If you want to stop the execution of the Job when the connection to your Cloudera Navigator fails, enter `"true"`. Otherwise, enter `"false"`.	Yes when you have entered `"true"` for USE_CLOUDERA_NAVIGATOR.

Properties relevant to configuring Hortonworks Atlas:

If you are using Hortonworks Data Platform V2.4 onwards to run your Spark Batch Jobs and Apache Atlas has been installed in your Hortonworks cluster, you can make use of Atlas to trace the lineage of given data flow to discover how this data flow was generated by a Job.

Function/parameter	Description	Mandatory?
USE_ATLAS	Enter `"true"` if you want to use Atlas. Otherwise, enter `"false"`.	Yes when you are using Spark on Hortonworks.
ATLAS_USERNAME and ATLAS_PASSWORD	Enter the credentials you use to connect to your Atlas. For example, `"\"username\""` as username and `"password"` as password.	Yes when you have entered `"true"` for USE_ATLAS.
ATLAS_URL	Enter the location of the Atlas to connect to, for example, `"\"http://localhost:21000\""`	Yes when you have entered `"true"` for USE_ATLAS.
SET_ATLAS_APPLICATION_PROPERTIES	If your Atlas cluster contains custom properties such as SSL or read timeout, enter `"true"`. Otherwise, enter `"false"`.	Yes when you have entered `"true"` for USE_ATLAS.
ATLAS_APPLICATION_PROPERTIES	Enter a directory in your local machine, then place the atlas-application.properties file of your Atlas in this directory, for example, `"\"/user/atlas/atlas-application.properties\""`. This way, your Job is enabled to use these custom properties.	Yes when you have entered `"true"` for SET_ATLAS_APPLICATION_PROPERTIES.
ATLAS_DIE_ON_ERROR	If you want to stop the Job execution when Atlas-related issues occur, enter `"true"`. Otherwise, enter `"false"`.	Yes when you have entered `"true"` for USE_ATLAS.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here