Skip to main content Skip to complementary content

Big Data: new features

Spark Job designer enhancements

Feature

Description

Available in

ADLS Gen2 Azure Data Lake Storage Generation2 is now supported with the following Big Data platforms:
  • Databricks V5.5 LTS
  • Cloudera CDH V6.1
  • Hortonworks Data Platform V3.1

All Talend products with Big Data

Snowflake The Snowflake components for Spark Batch are now generally available.

All Talend products with Big Data

Native Datasets
In Spark Batch Jobs, support for native Spark Datasets has been added to more components to obtain inherent performance gains. To benefit from this enhancement, users must be using Spark V2.0 onwards with the following components:
  • tFileInputParquet and tFileOutputParquet
  • tFileInputDelimited and tFileOutputDelimited
  • tFileInputFullRow
  • tFileInputPositional and tFileInputRegex
  • tSortRow, tExtractDelimitedFields, tExtractPositionalFields, tExtractRegexFields, tExtractXMLField, tExtractJSONFields, tNormalize, tReplace, tReplicate, tSample, tUnite and tSchemaComplianceCheck.
The following components require Spark V2.1 onwards to support Spark Datasets.
  • tAggregateRow
  • Left Outer Join in tMap, in addition to the tMap features that have had support for Datasets since Talend Studio V7.2.

All Talend products with Big Data

Delta Lake The tDeltaLakeInput and tDeltaLakeOutput components are now generally available.

All Talend products with Big Data

Apache Spark V2.4 This new Aparch Spark version is supported with more Big Data platforms in Spark Batch and Spark Streaming Jobs. The platforms which now support Spark V2.4 are:
  • Cloudera CDH6.1.1
  • Databricks V5.5
  • Google Cloud Dataproc V1.4

All Talend products with Big Data

Job status With Databricks, users are enabled to configure how often the Studio asks a Spark cluster for Job status.

All Talend products with Big Data

tS3Configuration With Amazon EMR, users can now apply an S3 bucket policy.

All Talend products with Big Data

tAggregateRow In Spark Batch Jobs, the Count (distinct) function and the Sample Standard Deviation Algorithm function have been added.

All Talend products with Big Data

New driver versions
The support for the following driver versions has been added to their related components:
  • Redshift JDBC driver V1.23.7.106
  • MySQL driver V8.0.18
  • Teradata JDBC driver V16.20.00.13
  • MariaDB JDBC driver V2.5.3 in JDBC components
  • Snowflake JDBC driver V3.11.x

All Talend products with Big Data

New components available

Two new components are now available: tAzureAdlsGen2Input and tAzureAdlsGen2Output.

All Talend products with Big Data

Support for Big Data platforms

Feature

Description

Available in

Databricks
  • Databricks V5.5 LTS is now supported by Spark Jobs.
  • Support for transient clusters of Azure Databricks has been added.

All Talend products with Big Data

Hortonworks Data Platform
  • Hortonworks Data Platform V3.1 is supported.
  • The Hortonworks Data Platform V3.x series is now generally available among the Dynamic Distributions.

All Talend products with Big Data

Google Cloud Dataproc

  • Google Cloud Dataproc V1.4 is supported
  • In Standard Jobs, tGoogleDataprocManage supports all regions.

All Talend products with Big Data

Custom Hadoop configuration When defining connections to Cloudera or Hortonworks in Repository, users can now specify a custom JAR file to provide the connection parameters of the Hadoop environment to be used.

All Talend products with Big Data

Other components

Feature

Description

Available in

Kafka Kafka V2.2.1 is now officially supported with:
  • Cloudera CDH V6.1
  • Hortonworks Data Platform V3.1
  • Kafka components in Standard Jobs

All Talend products with Big Data

Google BigQuery
  • In tBigQueryBulkExec, users can now drop tables with either a service account or their OAuth 2.0 credentials.
  • The BigQuery components now support Google cloud client API 1.25.10.

All Talend products with Big Data

Couchbase
  • tCouchbaseOutput now allows users to perform N1QL queries with parameters.
  • Non-JSON documents are supported.

All Talend products with Big Data

CXF

CXF V3.3.4 is now supported in the following components:

  • tDBFSConnection, tDBFSGet, tDBFSPut
  • tHCatalogInput, tHCatalogLoad, tHCatalogOperation, tHCatalogOutput

All Talend products with Big Data

MongoDB

The support for MongoDB V4.2.x has been added to the MongoDB components in Standard Jobs.

All Talend products with Big Data

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!