Running a Job with Spark Universal
Spark Universal is a mechanism that allows Talend Studio to be compatible with every big data distribution for a given Spark version. You choose a Spark version and upload a Hadoop configuration JAR file that contains all the necessary information to connect to your cluster.
When you use Spark Universal in Talend Studio, only Scala 2.12 is supported.
Spark Universal modes and environments support
Talend Studio supports the following modes and environments, depending on the Spark
versions:
Mode or environment | Spark 2.4.x | Spark 3.0.x | Spark 3.1.x | Spark 3.2.x | Spark 3.3.x | Spark 3.4.x | Spark 3.5.x |
---|---|---|---|---|---|---|---|
Local mode | Supported | Supported | Supported | Supported | Supported | Supported | Supported |
Standalone | Not supported | Not supported | Not supported | Supported | Not supported | Supported | Not supported |
Yarn cluster mode | Supported | Supported | Supported | Supported | Supported | Not supported | Not supported |
Databricks | Not supported | Not supported | Supported | Supported | Supported | Supported | Not supported |
Dataproc | Not supported | Not supported | Supported | Supported | Supported | Not supported | Not supported |
Cloudera Data Engineering | Not supported | Not supported | Supported | Supported | Not supported | Not supported | Not supported |
Kubernetes | Not supported | Not supported | Supported | Not supported | Not supported | Not supported | Not supported |
Spark-submit scripts | Not supported | Not supported | Not supported | Not supported | Supported | Not supported | Not supported |
Synapse | Not supported | Not supported | Not supported | Supported | Supported | Not supported | Not supported |
HDInsight | Not supported | Not supported | Supported | Not supported | Supported | Not supported | Not supported |
EMR Serverless | Not supported | Not supported | Not supported | Supported | Supported | Not supported | Not supported |
Information noteNote:
- Azure Synapse Analytics with Spark Universal 3.2.x and 3.3.x is only supported in Spark Batch Jobs.
- Spark-submit script with Spark Universal 3.3.x is only supported in Spark Batch Jobs.
Spark Universal distributions support
Talend Studio supports the following distributions in Yarn cluster mode,
depending on the Spark versions:
For example, if you want to connect to an Amazon EMR 6.2 cluster, you need
to select Spark 3.0.x version and then upload the Hadoop configuration JAR file that
contains all the *-site.xml files related to the cluster.
Spark version | Supported distributions in Yarn cluster mode |
---|---|
Spark 2.4.x |
|
Spark 3.0.x |
|
Spark 3.1.x |
|
Spark 3.2.x |
|
Spark 3.3.x |
|
This list of distribution is not exhaustive, you can use Yarn cluster with any other distribution if the Spark version matches, but keep in mind that they have not been officially tested by Talend and thus not guaranteed to work.