Data masking functions in the masking components
There are several functions in the masking components which vary according to the data type of the column.
It is advisable to use the functions predefined in the component with columns that contain personally identifiable information, such as first and last names, email addresses, addresses, SSNs, credit card numbers, bank account numbers, genders, date of births and salaries.
Format-preserving encryption in the masking components
The component uses Format-Preserving Encryption (FPE) methods to generate masked output values in the same format as the input values.
The FPE methods are based on a National Institute of Standards and Technology (NIST) standard:
- FF1 with AES relies on the Advanced Encryption Standard in CBC mode.
- FF1 with SHA-2 relies on the secure hash function HMAC-256.
The FPE methods are bijective methods, except when using tweaks.
The FF1 with AES and FF1 with SHA-2 methods require a password to generate encrypted and repeatable masked values. Those FPE methods do not use a seed.
You can specify this password in the password for FF1 method field, from the Advanced Settings of the component.
You can use tweaks so that the bijection is not performed. It makes the encryption stronger. A unique tweak is generated for each record and applies to all data of a record. The tweaks change at each Job execution. You can unmask the data by using the tDataUnmasking component and the corresponding tweaks.
Format-preserving encryption in the tDatamasking component
When using the FF1 with AES and FF1 with SHA-2 methods, input values must contain a minimum number of characters to be masked. Otherwise, the function returns null.
- FF1 with AES or FF1 with SHA-2
- The Digits alphabet
- "2" as an extra-parameter
The minimum number of characters required in the input values varies depending on the selected Alphabet.
When selecting Best guess, the number varies depending on the represented alphabets in the input values.
Alphabet | Minimum number of characters to mask |
---|---|
Alphanumeric | 4 |
Digits | 6 |
Latin extended | 3 |
Hiragana | 4 |
Katakana | 3 |
Kanji | 2 |
Hangul | 2 |
Alphabets
When using the Character handling functions, such as Replace all, Replace characters between two positions, Replace all digits with FPE methods, you must select an alphabet.
Characters that belong to the selected alphabet are masked with characters from the same alphabet.
When selecting the Best guess alphabet, masked values contain characters from all character types represented in the input values. Best guess is the default alphabet.
Any unrecognized character is copied to the output as is.
The following alphabets are supported:
Alphabet | Character Type | Unicode Range (version 11.0) | Corresponding characters |
---|---|---|---|
Alphanumeric | Latin numbers | [0030-0039] | [0-9] |
Latin lower-cased letters | [0061-007A] | [a-z] | |
Latin upper-cased letters | [0041-005A] | [A-Z] | |
Digits | Latin numbers | [0030-0039] | [0-9] |
Latin extended | Latin numbers | [0030-0039] | [0-9] |
Latin lower-cased letters | [0061-007A] | [a-z] | |
Latin extended lower-cased letters | [00DF-00F6] [00F8-00FF] | [ß-ö] [ø-ÿ] | |
Latin upper-cased letters | [0041-005A] | [A-Z] | |
Latin extended upper-cased letters | [00C0-00D6] [00D8-00DE] | [À-Ö] [Ø-Þ] | |
Hiragana | Hiragana | [3041-3096] 30FC 309D 309E | [ぁ-ゖ] ー ゝ ゞ |
Katakana | Half-with Katakana | https://www.unicode.org/charts/PDF/UFF00.pdf | [ヲ-ン][FF66-FF9D] |
Full-width Katakana | [30A1-30FA] 30FC 30FD 30FE | [ァ-ヺ] ー ヽ ヾ | |
Phonetic extension: [31F0-31FF] | [ㇰ-ㇿ] | ||
Kanji | Kanji | CJK Extension A[FF66-FF9D: [4E00-9FEF] [3400-4DB5] | [一-] [㐀-䶵] |
CJK Extension B: [20000-2A6D6] | [𠀀-𪛖] | ||
CJK Extension C: [2A700-2B734] | [𪜀-𫜴] | ||
CJK Extension D: [2B740-2B81D] | [𫝀-𫠝] | ||
CJK Extension E: [2B820-2CEA1] | [-] | ||
CJK Extension F: [2CEB0-2EBE0] | [-] | ||
CJK Compatibility Ideographs: [F900-FA6D] [FA70-FAD9] | [豈-舘] [-] | ||
CJK Compatibility Ideographs Supplement: [2F800-2FA1D] | [-] | ||
KangXi Radicals: [2F00-2FD5] | [⼀-⿕] | ||
CJK Radicals Supplement: [2E80-2E99] [2E9B-2EF3] | [⺀-⺙] [⺛-⻳] | ||
CJK Symbols and Punctuation: [3005-3005] [3007-3007] [3021-3029] [3038-303B] | [々-々] [〇-〇] [〡-〩] [〸-〻] | ||
Hangul | Hangul | [AC00-D7AF] | [가-] |