Defining the Cloudera connection parameters
Complete the Cloudera connection configuration in the Spark configuration tab of the Run view of your Job. This configuration is effective on a per-Job basis.
If you cannot find the Cloudera or Hortonworks version to be used from the Version drop-down list, you can add your distribution via some
      dynamic distribution settings in the Studio. 
      - On the version list of the distributions, some versions are labelled Builtin. These versions were added by Talend via the Dynamic distribution mechanism and delivered with the Studio when the Studio was released. They are certified by Talend, thus officially supported and ready to use.
 
If you cannot
      find the Cloudera version to be used from this drop-down list, you can add your distribution
      via some dynamic distribution settings in the Studio. 
      - On the version list of the distributions, some versions are labelled Builtin. These versions were added by Talend via the Dynamic distribution mechanism and delivered with the Studio when the Studio was released. They are certified by Talend, thus officially supported and ready to use.
 
The information in this section is only for users who have subscribed to Talend Data Fabric or to any Talend product with Big Data.
Procedure
Results
- 
               After the connection is configured, you can tune the Spark performance, although not required, by following the process explained in:
- 
                  
Tuning Spark for Apache Spark Batch Jobs for Spark Batch Jobs.
 - 
                  
Tuning Spark for Apache Spark Streaming Jobs for Spark Streaming Jobs.
 
 - 
                  
 - 
               It is recommended to activate the Spark logging and checkpointing system in the Spark configuration tab of the Run view of your Spark Job, in order to help debug and resume your Spark Job when issues arise:
 - 
               If you are using Cloudera V5.5+ to run your MapReduce or Apache Spark Batch Jobs, you can make use of Cloudera Navigator to trace the lineage of given data flow to discover how this data flow was generated by a Job.