Skip to main content

Data Quality: new features

Feature

Description

New components

The tAmazonAuroraValidRows and tAmazonAuroraInvalidRows components check Amazon Aurora database rows against specific data quality patterns (regular expression) or data quality rules (business rule).

Data mart asynchronous creation Creating the data quality data mart from Talend Studio is now an asynchronous operation. This feature lets users perform other actions in Talend Studio while the data quality data mart is being created.
Data masking leveraging Format-Preserving Encryption methods The tDataMasking and tPatternMasking component can now securely mask data by leveraging Format-Preserving Encryption algorithms, allowing repeatable and bijective masking by providing a password. The original data is unreadable without the knowledge of the provided password.
Data unmasking When data was masked using the tDataMasking and tPatternMasking components combined with a Format-Preserving Encryption algorithm and a password, the tDataUnmasking and tPatternUnmasking components, respectively, can retrieve the original data by reversing the masking using the same password.
Data encryption and data decryption The new tDataEncrypt component can protect data by encrypting it with AES-GCM and Blowfish algorithms and a user-defined password. The encrypted data is unreadable without the knowledge of the provided password and the generated cryptographic file.

The tDataDecrypt component can decrypt data that has been encrypted using the tDataEncrypt component.

Match grouping The GRP_QUALITY output column of tMatchGroup depends now on the matching algorithm. When using the t-Swoosh algorithm, the GRP_QUALITY value is computed by taking the minimal value among all record pairs of the group. It may have an effect on the results when having multiple outputs, since the GRP_QUALITY value determines in which output flow a record goes to.

The behavior with the Simple VSR algorithm has not been modified.

Spark 2.4 support Talend supports Spark 2.4 (local mode) when running Jobs in Talend Studio with the following components:
  • tALSModel
  • tCompareColumns
  • tDataMasking
  • tDataShuffling
  • tGenKey
  • tJapaneseNumberNormalize
  • tJapaneseTokenize
  • tJapaneseTransliterate
  • tMatchIndex
  • tMatchIndexPredict
  • tMatchModel
  • tNaiveBayesModel
  • tPatternMasking
  • tPredict
  • tRandomForestModel
  • tRecommend
  • tReservoirSampling
  • tRuleSurvivorship
  • tStandardizePhoneNumber
  • tStandardizeRow
  • tSynonymSearch
  • tTransliterate
  • tVerifyEmail
Support for additional databases Talend now supports additional databases for the data quality data mart:
  • Amazon RDS for Aurora
  • Amazon RDS for MySQL
  • Amazon RDS for PostgreSQL
  • Amazon RDS for Oracle
  • Amazon RDS for SQL Server
  • Azure Database for MySQL
  • Azure Database for PostgreSQL
  • Azure SQL Database
  • Oracle 18c
Support for additional databases Talend now supports additional databases for the Profiling perspective:
  • Amazon RDS for Aurora
  • Amazon RDS for MariaDB
  • Amazon RDS for Oracle
  • Amazon RDS for PostgreSQL
  • Amazon RDS for SQL Server
  • AS/400 V7R1 to V7R3
  • Azure Database for MySQL
  • Azure Database for PostgreSQL
  • Azure SQL Database
  • Google BigQuery (via JDBC)
  • IBM DB2 11.1
  • Ingres 10.2
  • Netezza 7.2
  • Oracle 18c
  • Snowflake (via JDBC)
  • Sybase 15.5/15.7
  • Teradata 16
Survivorship rules The T-Swoosh algorithm supports the most ancient and Most recent survivorship functions on non-date columns.
tBRMS The tBRMS component now supports Red Hat Decision Manager 7.3.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!