Setting sensitivity and obfuscation method on fields

From the source module, administrators set field sensitivity and masking rules. Fields that are sensitive should be marked as Sensitive in the General Information tab (accessed by selecting icon view details (view details) or from View/Edit General Information available from More dropdown on each field row. Once fields are marked as Sensitive, a Select a masking rule dialog with masking rules available in the dropdown appears and a masking rule must be specified. The rule selected for specific fields will mask those fields by applying that method upon data export.

Available masking rules appear in dropdown for fields marked sensitive — Setting field sensitivity and masking rule

Obfuscation methods

Rules are created by selecting a primary obfuscation method (such as NumericObfuscator, CharClassObfuscator, DigestObfuscator, DictionaryObfuscator) and adding parameters or salt to the algorithm to generate a hash which will become an available rule in source. Users then select the rule from the Sensitive (Masking Rule) dropdown to apply that treatment to the field upon export. Qlik Catalog supplies several obfuscation rules out of the box. The table below describes the primary obfuscation methods that provide the foundation for the methods, applicable parameters, and example rules. If a random salt has been built into the rule, different outcomes will result for the same inputs. Default rules include: Char Class Random, Digest MD5 Upper x2, Digest SHA256 Lower, Numeric Blur 50%, Numeric Floor 500 Ceiling 10000, Numeric Round to Nearest 100, Replace All With Null, Replace Not Null, and Replace Null. Administrators are encouraged to create their own rules and supply unique dictionaries to be used with Dictionary Obfuscator. Instruction for upload of obfuscation and dictionary files can be found in the installation guide.

CharClassObfuscator

Description

The CharClassObfuscator applies obfuscation within a character class.

Digit chars ['0' - '9'] are converted into other digits.

Upper case US_ASCII chars ['A' - 'Z'] are converted into other upper case chars.

Lower case US_ASCII chars ['a' - 'z'] are converted into other lower case chars.

By default, upper case Latin-1 supplement chars are converted into other upper case Latin-1 supplement chars.

By default, lower case Latin-1 supplement chars are converted into other lower case Latin-1 supplement chars.

All other chars, including punctuation and non-Latin chars, remain unmodified.

The substitution of chars happens on a char-by-char basis, so that if you pass in a string of length n you will get a result of length n. The entire raw value string is combined with the supplied seed to generate the effective seed on a value-by-value basis. This means that values with the same prefix will not have the same prefix on the generated output. For example:

foo => etm
foot => pqyb

When CharClassObfuscator rule is applied to Boolean values, all values are converted to true only.

Parameters

(predefined upon rule creation)

Behavior of alphabetic Latin-1 supplement chars can be controlled through the option:

--no-latin-1-supplements

Suppresses the output of upper and lower case Latin-1 supplement chars by mapping them to random chars in the unaccented ['A' - 'Z'] and ['a' - 'z'] ranges, respectively.

Examples

Char Class Random

Random obfuscation within character class

raw	obfuscated
NONE	YGWH
a.c.e.	w.p.w.
AEROmotors	ZQUFluxhip
AEROMOTORS	BWSKUWZVDD
AEROMOTORS	KRTFPPOJSAQ

DigestObfuscator

Description

The DigestObfuscatorconverts raw value STRINGs into a message digest hash. The default hash is SHA-256, although MD5 or other message digest hash functions can be applied.

By default, the DigestObfuscator converts the binary message digest value into Base64 encoded text representation. One can optionally choose to convert to a hex value with upper or lower case chars.

An optional iteration-count parameter can be used to run the data through the digest multiple times.

Digest hashes always generate binary results of the same length, regardless of the length of the data that is passed in. SHA-256, for example, always generates a 20 byte binary digest value. If one specifies a hex option then all return string values will be 40 hexadecimal chars in length.

Parameters

--digest-name
Defaults to SHA-256. A common alternative is MD5. Any message digest which supports the Java Cryptography Architecture API is supported.

--hex-upper-case
Returns the digest as upper case hex instead of the default Base64 encoding.

--hex-lower-case
Returns the digest as lower case hex instead of the default Base64 encoding.

--iteration-count
Specifies an iteration count of how many times the data value should be run through the digest.

Default=1

Examples

Digest MD5 Upper x2

Converts raw data according to MD5 Hash algorithm, to uppercase hexadecimal, run through the digest two times

raw	obfuscated
NONE	8B7B9B61249D6759736B233B1CACD1CD
AEROMAX	2F73AC010EBB536FFCED6D04FB92DA6B
AEROmotorâ	B68E201EB2D116E28715EF52279B6EDD

Digest SHA256 Lower

Converts raw data according to SHA256 Hash algorithm, to lowercase hexadecimal

raw	obfuscated
NONE	ae2129ec96ca4cc7ade60d8b3460e7fc
AEROMAX	000a85730b9042275870820bde185a2e
AEROmotorâ	c1c43a59341aa3b531bf2150bae7fe7c

NumericObfuscator

Description

Masks Numeric Values. The NumericObfuscator supports supports several types of numeric/mathematical functions which can be used individually or combined. The argument string to the NumericObfuscator allows one to specify a masking rule that meets requirements for a specific data value.

NULL values which are passed in are always returned as NULL.

ZERO values which are passed in are always returned as ZERO.

Other values are modified according to user-specified options and parameters.

The NumericObfuscator supports several types of numeric mathematical functions which can be used individually or combined. The argument string given to the NumericObfuscator Allows one 0 0t0o0 0s0p0e0c0i0f0y0 a masking rule that meets requirements for a specific data profile.

Parameters

--blur-percentage
Takes a positive percentage >=0 and <= 100
The numeric value will be "blurred" by adding or subtracting +/- a random percentage of the value.
As with all obfuscators, the "randomness" can be controlled and stabilized by the specified salt or seed value (if a seed is employed).
Values are returned with the same scale that was passed in, including trailing zeros.
For example, the number 123.45 with --blur-percentage 10 might return 130.00 ... note there are two zeros to the right of the decimal point as the original value had two digits to the right of the decimal point.

--round-to-nearest
Takes a number, possibly with digits to the right of a decimal point to specify scale.
Raw values are rounded according to the rounding rule, which defaults to HALF_EVEN.
Rounding can be to any value, not just a power of 10.
example: "--round-to-nearest 250.00" would round to the nearest 250.00
The two zeros to the right of the decimal point will be retained and all returned values will always be evenly divisible by a factor of 250.00

--rounding-mode
Allows one to specify a java.math.RoundingMode in combination with --round-to-nearest
Defaults to HALF_EVEN, also known as "bankers rounding"

--ceiling
Takes a number, possibly with digits to the right of a decimal point to specify desired scale.
Allows one to specify an upper limit for allowable masked values.
Any raw value greater than the specified ceiling will be converted to the ceiling value during masking.
For example, one could specify "--ceiling 1000.00" and any value greater than 1000 will be capped and converted to 1000.00
It is generally recommended that the scale of the specified ceiling be the same as the scale of the underlying data values to be masked.

--floor
Takes a number, possibly with digits to the right of a decimal point, to specified desired scale.
Allows one to specify a lower limit for allowable masked values.
Any raw value less than the specified floor will be converted to the floor value during masking.
It is generally recommended that the scale of the specified floor be the same as the scale of the underlying data values to be masked.

--limit-ceiling-floor-first
When multiple rules are combined, ceiling and floor limits are normally applied last.
When this option is provided then the ceiling and floor limits are applied before other operations.
For example, when combining a --blur-percentage with a --ceiling, applying the ceiling limit before the blur will give different results than applying the ceiling limit after the blur.

--return-null-for-non-numeric
All Obfuscators, including the NumericObfuscator, take a text STRING as the raw value to be obfuscated.
By default, passing a non-numeric value (except for NULL) to the NumericObfuscator will throw an exception.
This option allows one to say that non-numeric values should be quietly converted to NULL.

Examples

Numeric Blur 30%

Converts a numeric value by adding or subtracting a random percentage of the value by up to 30%

raw	obfuscated
320	379
18900	15744
981	689
520	397

Numeric Floor 500 Ceiling 10000

Converts numeric values by applying a floor of 500 where any value less than 500 will be converted to 500 and applies a ceiling of 10000 where any value over 10000 will be converted to 10000 upon export. Values between 500 and 10000 are not modified.

raw	obfuscated
320	500
18900	10000
981	981
520	520

Numeric Round to Nearest 100

Rounds values to the nearest 100 according to default rounding rule (HALF-EVEN) or rule specified during creation of obfuscation rule

raw	obfuscated
320	300
18900	18900
981	1000
520	500

ConstantValueObfuscator

Description

ConstantValueObfuscator applies obfuscation by returning a constant value. It enables separate control for non-null and null values. It can be used to set all values to null, effectively preventing all values from being exported.

Parameters

--always-null

All inbound values are converted to null

--replacement-when-null <constant1>

inbound null values are converted to <constant1>

--replacement-when-not-null <constant2>

inbound null values are converted to <constant2>

Limitations: Constant values containing SPACE characters are not supported.

Examples

Replace All With Null

raw	obfuscated
St. Louis	null
Boston	null
London	null
Tokyo	null

Replace Not Null

Replaces Not Null values with a predefined constant value

raw	obfuscated
899	biz baz
899.01	bizbaz
null	null
56099	bizbaz

Replace Null

Replaces Null values with a predefined constant value

raw	obfuscated
null	bizbaz
null	bizbaz
6785.44	6785.44
null	bizbaz

DictionaryObfuscator

Description

DictionaryObfuscator masks data by mapping to a value that is looked up in a dictionary. Qlik Catalog can supply sample dictionaries from US Census and geographic sources. Customers are encouraged to create their own dictionaries to meet specific data obfuscation needs.

As with all Qlik Catalog obfuscation methods, the DictionaryObfuscator will generate stable/consistent/synchronized results. That is, for a given (seed + dictionary + value) the results will always be the same.

For example, imagine we construct a masking rule using the DictionaryObfuscator with the US_Cities.txt dictionary and <mySeed> as the seed. If the value "New York" maps to "Kalamazoo" then all instances of "New York" will map to "Kalamazoo".

Unless otherwise specified, the dictionary will be applied using a uniform random distribution. If the dictionary contains numeric weights, then the masking rule can specify that a weighted distribution should be applied to the masked dictionary values. Be advised that in many cases the underlying data already has a weighted distribution of values, so applying a second level of dictionary weights might skew the results in unanticipated ways.

Parameters

--dictionary
A required parameter which specifies the file name (not directory path name) of the dictionary.

--weighted-distribution
Specifies the desire to use weights in the dictionary to apply a weighted distribution to the mapping.

Examples

Female First Name Weighted Rule

First name is selected, replaced with synchronized values across multiple datasets. For the example below, the name 'Udi' probability value is high (100) so it occurs more (10) times than other names.

raw	obfuscated
Georgi	Udi
Bezalel	Stephanie
Parto	Udi
Kyoichi	Jocelyn

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here