Data Quality: new features
Feature |
Description |
---|---|
New components |
The tAmazonAuroraValidRows and tAmazonAuroraInvalidRows components check Amazon Aurora database rows against specific data quality patterns (regular expression) or data quality rules (business rule). |
Data mart asynchronous creation | Creating the data quality data mart from Talend Studio is now an asynchronous operation. This feature lets users perform other actions in Talend Studio while the data quality data mart is being created. |
Data masking leveraging Format-Preserving Encryption methods | The tDataMasking and tPatternMasking component can now securely mask data by leveraging Format-Preserving Encryption algorithms, allowing repeatable and bijective masking by providing a password. The original data is unreadable without the knowledge of the provided password. |
Data unmasking | When data was masked using the tDataMasking and tPatternMasking components combined with a Format-Preserving Encryption algorithm and a password, the tDataUnmasking and tPatternUnmasking components, respectively, can retrieve the original data by reversing the masking using the same password. |
Data encryption and data decryption | The new tDataEncrypt component can protect data by
encrypting it with AES-GCM and Blowfish algorithms and a user-defined
password. The encrypted data is unreadable without the knowledge of the
provided password and the generated cryptographic file. The tDataDecrypt component can decrypt data that has been encrypted using the tDataEncrypt component. |
Match grouping |
The GRP_QUALITY output column of tMatchGroup depends
now on the matching algorithm. When using the t-Swoosh algorithm, the
GRP_QUALITY value is computed by taking the minimal value among all
record pairs of the group. It may have an effect on the results when
having multiple outputs, since the GRP_QUALITY value determines in which
output flow a record goes to.
The behavior with the Simple VSR algorithm has not been modified. |
Spark 2.4 support |
Talend supports Spark 2.4 (local mode) when running Jobs in
Talend Studio with the
following components:
|
Support for additional databases |
Talend now supports additional databases for the data quality
data mart:
|
Support for additional databases |
Talend now supports additional databases for the Profiling perspective:
|
Survivorship rules | The T-Swoosh algorithm supports the most ancient and Most recent survivorship functions on non-date columns. |
tBRMS | The tBRMS component now supports Red Hat Decision Manager 7.3. |