Big Data: known issues and known limitations

Limitation	Description	Available in
Hive	Hive is not supported in Spark Local mode.	All subscription-based Talend products with Big Data
Java 11	Java 11 is not supported in the Standard Jobs or the Metadata Repository once they involve big data distributions. Java 11 is not supported in the Spark Jobs. This limitation is due to the constraint to support Java 11 of the big data distributions. To run your Spark Jobs and Standard Jobs or Metadata Repository that involve big data distributions, you need to install Java 8 on your computer, and in Talend Studio customize the path in Preferences > Talend > Java interpreter and then browse the location of JDK 8 in Preferences > Java > Installed JREs.	All subscription-based Talend products with Big Data

Limitation

Description

Available in

Hive

Hive is not supported in Spark Local mode.

All subscription-based Talend products with Big Data

Java 11

Java 11 is not supported in the Standard Jobs or the Metadata Repository once they involve big data distributions.
Java 11 is not supported in the Spark Jobs.

This limitation is due to the constraint to support Java 11 of the big data distributions.

To run your Spark Jobs and Standard Jobs or Metadata Repository that involve big data distributions, you need to install Java 8 on your computer, and in Talend Studio customize the path in Preferences > Talend > Java interpreter and then browse the location of JDK 8 in Preferences > Java > Installed JREs.

All subscription-based Talend products with Big Data

Issue	Workaround	Available in
When you run Spark Jobs with Dataproc 2.x, Azure Synapse and HD Insight 4.0 distributions, the following error can be returned: java.lang.NoSuchMethodError: org.apache.log4j.helpers.	Following the Log4j2 security issue (CVE-2021-44228), make sure to disable Log4j loggers when you run Spark Batch and Spark Streaming Jobs with Dataproc 2.x and onwards, Azure Synapse and HD Insight 4.0 distributions. To avoid any Job failure, clear the Activate log4j in components check box from the Log4j view in File > Edit Project Properties > Project Settings or clear the log4jLevel check box from the Advanced settings view of your Spark Job.	All subscription-based Talend products with Big Data
When you run a Spark Batch Jobs with MapRDB components that have Date type columns in schema columns, the following compile error appears: "The method toBytes(ByteBuffer) in the type Bytes is not applicable for the arguments (Date)".	Date type columns in schema columns cannot be used when you run a Spark Batch Job with MapRDB components.	All subscription-based Talend products with Big Data
HBase is not working with a CDP 7.1.x cluster using Kerberos in YARN Client mode and returns the following error: hbase.pb.AuthenticationService.GetAuthenticationTokenorg.apache.hadoop.hbase.HBaseIOException: com.google.protobuf.ServiceException: Error calling method hbase.pb.AuthenticationService.GetAuthenticationToken.	If you want to use Kerberos when using HBase with a CDP 7.1.x cluster, it is recommended to use YARN Cluster mode instead of YARN Client mode.	All subscription-based Talend products with Big Data

Issue

Workaround

Available in

When you run Spark Jobs with Dataproc 2.x, Azure Synapse and HD Insight 4.0 distributions, the following error can be returned: java.lang.NoSuchMethodError: org.apache.log4j.helpers.

Following the Log4j2 security issue (CVE-2021-44228), make sure to disable Log4j loggers when you run Spark Batch and Spark Streaming Jobs with Dataproc 2.x and onwards, Azure Synapse and HD Insight 4.0 distributions.

To avoid any Job failure, clear the Activate log4j in components check box from the Log4j view in File > Edit Project Properties > Project Settings or clear the log4jLevel check box from the Advanced settings view of your Spark Job.

All subscription-based Talend products with Big Data

When you run a Spark Batch Jobs with MapRDB components that have Date type columns in schema columns, the following compile error appears:

"The method toBytes(ByteBuffer) in the type Bytes is not applicable for the arguments (Date)".

Date type columns in schema columns cannot be used when you run a Spark Batch Job with MapRDB components.

All subscription-based Talend products with Big Data

HBase is not working with a CDP 7.1.x cluster using Kerberos in YARN Client mode and returns the following error: hbase.pb.AuthenticationService.GetAuthenticationTokenorg.apache.hadoop.hbase.HBaseIOException: com.google.protobuf.ServiceException: Error calling method hbase.pb.AuthenticationService.GetAuthenticationToken.

If you want to use Kerberos when using HBase with a CDP 7.1.x cluster, it is recommended to use YARN Cluster mode instead of YARN Client mode.

All subscription-based Talend products with Big Data

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here