Big Data: new features
Spark Job designer enhancements
Feature |
Description |
Available in |
---|---|---|
ADLS Gen2 | Azure Data Lake Storage Generation2 is now supported
with the following Big Data platforms:
|
All Talend products with Big Data |
Snowflake | The Snowflake components for Spark Batch are now generally available. |
All Talend products with Big Data |
Native Datasets |
In Spark Batch Jobs, support for native Spark Datasets has been added to more components to
obtain inherent performance gains. To benefit from this enhancement,
users must be using Spark V2.0 onwards with the following components:
The following components require Spark V2.1 onwards to support Spark
Datasets.
|
All Talend products with Big Data |
Delta Lake | The tDeltaLakeInput and tDeltaLakeOutput components are now generally available. |
All Talend products with Big Data |
Apache Spark V2.4 | This new Aparch Spark version is supported with more
Big Data platforms in Spark Batch and Spark Streaming Jobs. The platforms
which now support Spark V2.4 are:
|
All Talend products with Big Data |
Job status | With Databricks, users are enabled to configure how often the Studio asks a Spark cluster for Job status. |
All Talend products with Big Data |
tS3Configuration | With Amazon EMR, users can now apply an S3 bucket policy. |
All Talend products with Big Data |
tAggregateRow | In Spark Batch Jobs, the Count (distinct) function and the Sample Standard Deviation Algorithm function have been added. |
All Talend products with Big Data |
New driver versions |
The support for the following driver versions has been
added to their related components:
|
All Talend products with Big Data |
New components available |
Two new components are now available: tAzureAdlsGen2Input and tAzureAdlsGen2Output. |
All Talend products with Big Data |
Support for Big Data platforms
Feature |
Description |
Available in |
---|---|---|
Databricks |
|
All Talend products with Big Data |
Hortonworks Data Platform |
|
All Talend products with Big Data |
Google Cloud Dataproc |
|
All Talend products with Big Data |
Custom Hadoop configuration | When defining connections to Cloudera or Hortonworks in Repository, users can now specify a custom JAR file to provide the connection parameters of the Hadoop environment to be used. |
All Talend products with Big Data |
Other components
Feature |
Description |
Available in |
---|---|---|
Kafka | Kafka V2.2.1 is now officially supported with:
|
All Talend products with Big Data |
Google BigQuery |
|
All Talend products with Big Data |
Couchbase |
|
All Talend products with Big Data |
CXF |
CXF V3.3.4 is now supported in the following components:
|
All Talend products with Big Data |
MongoDB |
The support for MongoDB V4.2.x has been added to the MongoDB components in Standard Jobs. |
All Talend products with Big Data |