Skip to main content Skip to complementary content

Data masking functions in the tDataMasking component

There are several functions in the tDataMasking component which vary according to the data type of the column.

It is advisable to use the functions predefined in the component with columns that contain personally identifiable information, such as first and last names, email addresses, addresses, SSNs, credit card numbers, bank account numbers, genders, date of births and salaries.

Format-preserving encryption

The component uses Format-Preserving Encryption (FPE) methods to generate masked output values in the same format as the input values.

Information noteNote: Java 8u161 is the minimum required version to use the FF1 with AES method. To be able to use this FPE method with Java versions earlier than 8u161, download the Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files from Oracle website.

The FPE methods are based on a National Institute of Standards and Technology (NIST) standard:

  • FF1 with AES relies on the Advanced Encryption Standard in CBC mode.
  • FF1 with SHA-2 relies on the secure hash function HMAC-256.

The FPE methods are bijective methods.

The FF1 with AES and FF1 with SHA-2 methods require a password to generate encrypted and repeatable masked values. Those FPE methods do not use a seed.

You can specify this password in the password for FF1 method field, from the Advanced Settings of the component.

When using the FF1 with AES and FF1 with SHA-2 methods, input values must contain a minimum number of characters to be masked. Otherwise, the function returns null.

For example, you want to mask S426A789QQ using the Keep first n digits and replace following ones function with the following parameters:
  • FF1 with AES or FF1 with SHA-2
  • The Digits alphabet
  • "2" as an extra-parameter
There are only 4 digits to be masked because you decided to keep the two first digits. As a result, the function returns null.

The minimum number of characters required in the input values varies depending on the selected Alphabet.

When selecting Best guess, the number varies depending on the represented alphabets in the input values.

Alphabet Minimum number of characters to mask
Alphanumeric 4
Digits 6
Latin extended 3
Hiragana 4
Katakana 3
Kanji 2
Hangul 2

Alphabets

When using the Replace all, Replace characters between two positions, Replace n first digits and Replace n last digits with FPE methods, you can select an alphabet.

Characters that belong to the selected alphabet are masked with characters from the same alphabet.

When selecting the Best guess alphabet, masked values contain characters from all character types represented in the input values. Best guess is the default alphabet.

Any unrecognized character is copied to the output as is.

The following alphabets are supported:

Alphabet Character Type Unicode Range (version 11.0) Corresponding characters
Alphanumeric Latin numbers [0030-0039] [0-9]
Latin lower-cased letters [0061-007A] [a-z]
Latin upper-cased letters [0041-005A] [A-Z]
Digits Latin numbers [0030-0039] [0-9]
Latin extended Latin numbers [0030-0039] [0-9]
Latin lower-cased letters [0061-007A] [a-z]
Latin extended lower-cased letters [00DF-00F6] [00F8-00FF] [ß-ö] [ø-ÿ]
Latin upper-cased letters [0041-005A] [A-Z]
Latin extended upper-cased letters [00C0-00D6] [00D8-00DE] [À-Ö] [Ø-Þ]
Hiragana Hiragana [3041-3096] 30FC 309D 309E [ぁ-ゖ] ー ゝ ゞ
Katakana Half-with Katakana [FF66-FF9D] [ヲ-ン]
Full-width Katakana [30A1-30FA] 30FC 30FD 30FE [ァ-ヺ] ー ヽ ヾ
Phonetic extension: [31F0-31FF] [ㇰ-ㇿ]
Kanji Kanji CJK Extension A: [4E00-9FEF] [3400-4DB5] [一-] [㐀-䶵]
CJK Extension B: [20000-2A6D6] [𠀀-𪛖]
CJK Extension C: [2A700-2B734] [𪜀-𫜴]
CJK Extension D: [2B740-2B81D] [𫝀-𫠝]
CJK Extension E: [2B820-2CEA1] [-]
CJK Extension F: [2CEB0-2EBE0] [-]
CJK Compatibility Ideographs: [F900-FA6D] [FA70-FAD9] [豈-舘] [-]
CJK Compatibility Ideographs Supplement: [2F800-2FA1D] [-]
KangXi Radicals: [2F00-2FD5] [⼀-⿕]
CJK Radicals Supplement: [2E80-2E99] [2E9B-2EF3] [⺀-⺙] [⺛-⻳]
CJK Symbols and Punctuation: [3005-3005] [3007-3007] [3021-3029] [3038-303B] [々-々] [〇-〇] [〡-〩] [〸-〻]
Hangul Hangul [AC00-D7AF] [가-힯]

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!