Skip to main content Skip to complementary content

Data masking capabilities

Masking functions in the tDataMasking component are consistent, bijective and/or random functions, and they can check that the input data is in a valid format.

Random data masking

Random masking consists of masking an input value with a randomly generated value.

When there are multiple occurrences of the same value in the input dataset, it can be masked with different values.

Different values from the input dataset can be masked with the same value.

For example, the following diagram shows an example of how the tDataMasking component can mask data randomly:
  • The A value is masked with D when it first appears in the input dataset.
  • The B and C values are masked with E.
  • The A value is masked with F when it appears in the input dataset for the second time.
Example of random masking operation

Random data masking examples

The following table shows examples of generated masked values using the Replace the first n characters function:
Input values Extra Parameter Examples of masked values
newuser@domain.com "4" ohsbser@domain.com
admin@company.com "4" lneen@company.com
newuser@domain.com "4" qzmaser@domain.com
The following table shows examples of generated masked values using the Generate from pattern function:
Input values Extra Parameter Examples of masked values
newuser@domain.com "aaaaaa" rxvsas
admin@company.com "aaaaaa" bbwpba
newuser@domain.com "a9aaa9" r8daw1
The following table shows examples of generated masked values for the Generate French SSN number function:
Input values Examples of masked values
190049418437621 2590459222147 22
271083561478941 1900846274448 17
190049418437621 2730364078284 70
117029 1750694861914 69

Consistent data masking

When the same value appears twice in the input data, consistent masking functions output the same masked value in the same Job execution.

However, two different input values can be masked with the same value in the output.

For example, the following diagram shows an example of how the tDataMasking component can mask data consistently:
  • The A value is masked with D, regardless of the number of occurrences in the input dataset.
  • The B and C values are masked with E.
Example of a consistent masking operation

Consistent data masking examples

The following table shows examples of generated masked values using the Mask email left part of domain with consistent items function:
Input values Extra Parameter Examples of masked values
newuser@domain.com "talend,value,newcompany" newuser@newcompany.com
admin@company.com "talend,value,newcompany" admin@value.com
newuser@domain.com "talend,value,newcompany" newuser@newcompany.com
user@company.com "talend,value,newcompany" user@value.com
user@domain.com "talend,value,newcompany" user@newcompany.com

Bijective data masking

Bijective masking functions have the following characteristics:
  • They are consistent masking functions.
  • They are injective, meaning that they output two different masked values for two different input values.
  • They check that the input data is in a valid format. If the input value is valid, bijective masking functions output a valid value. If the input value is not valid, they output an invalid value or replace values with null, depending of the masking function used.
For example, the following diagram shows an example of how the tDataMasking component can mask data bijectively:
  • The A value is masked with D, regardless of the number of occurrences in the input dataset.
  • The B value is masked with E.
  • The C value is masked with F.
Example of a bijective data masking operation

Bijective data masking examples

The following table shows examples of generated masked values using the Mask French SSN number function:
Input values Example of masked values
190049418437621 289052428331901
271083561478941 234112758889352
190049418437621 289052428331901
117029 null

Repeatable data masking

To produce repeatable masked values between Job executions, define a seed or a password in the Advanced settings of the component.

For a given combination of input and seed values, the same masked value is produced.

When using Format-Preserving Encryption methods, the same masked value is produced for a given combination of an input value and a password.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!