This distribution could be:
-
Databricks
-
Amazon EMR
For this distribution,
Talend supports:
Information noteImportant: Delta Lake is not supported on
Amazon EMR.
-
Cloudera
For this distribution,
Talend supports:
-
Standalone
-
Yarn client
-
Yarn cluster
-
Cloudera Altus
For this distribution,
Talend supports:
-
Yarn cluster
Your Altus cluster should run on the following Cloud
providers:
As a Job relies on Avro to move data among its components, it is recommended to
set your cluster to use Kryo to handle the Avro types. This not only helps avoid
this Avro known issue but also brings inherent
performance gains. The Spark property to be set in your cluster is:
spark.serializer org.apache.spark.serializer.KryoSerializer
If you cannot find the distribution corresponding to yours from this
drop-down list, this means the distribution you want to connect to is not officially
supported by
Talend
. In this situation, you can select Custom, then select the Spark
version of the cluster to be connected and click the
[+] button to display the dialog box in which you can
alternatively:
-
Select Import from existing
version to import an officially supported distribution as base
and then add other required jar files which the base distribution does not
provide.
-
Select Import from zip to
import the configuration zip for the custom distribution to be used. This zip
file should contain the libraries of the different Hadoop/Spark elements and the
index file of these libraries.
Note that custom versions are not officially supported by Talend. Talend and its community provide you with the opportunity to
connect to custom versions from Talend Studio but cannot guarantee that the configuration of whichever version you choose
will be easy. As such, you should only attempt to set up such a connection if
you have sufficient Hadoop and Spark experience to handle any issues on your
own.
For a step-by-step example about how to connect to a custom
distribution and share this connection, see Hortonworks.