Edit a Data-detected Data Class
Steps
- Manage data classes.
- If the data class does not yet exist, add the data class.
- You may edit all the properties in common for a data class.
You may not edit the Type after it has been set. You must create a new data class instead.
- Set the MATCHING THREASHOLD to specify the minimum percentage of values matching any of the enumeration values, patterns or regular expression among all values (of that field/column).
- Set the UNIQUENESS THREASHOLD to specify he minimum number of unique values among all values (of that field/column) to require enough diversity of the data set.
By default, the UNIQUENESS THREASHOLD is set to 1 on enumerations (and limited to the maximum number of enumeration values) and set to 6 otherwise.
- Enter the DATA PATTERN, which may be one of the following:
- Enumeration: a list of values for the data to match.
- Pattern: Patterns for the data to match.
- Regular Expression: RegEx format expression for the data to match.
- Click SAVE.
Usage
To understand these settings, an all-women’s college student database can have 1000s of rows that all have Female in the Gender column. In this case, the UNIQUENESS THREASHOLD should be set to 1 to match the Gender data class.
The International Gender enumeration data class has Male and Female values in different languages. When the customer has a column that uses Male and Female values in one language the application will match it with confidence less than 100% because of other languages. It is recommended that you use “International” data classes with care and employ them only when you have truly multilingual columns. Otherwise, you should define a data class for each language used and group them in an “International” compound data class. For example:
- English Gender (enumeration): Male, Female
- French Gender (enumeration): Mâle, Femelle
- International Gender (compound): English Gender, French Gender
When the matching rule is Enumeration and the number of its possible values is less than the one specified in the UNIQUENESS THREASHOLD the application uses the number of possible values as the UNIQUENESS THREASHOLD.
Example
Sign in as Administrator and go to MANAGE > Data Classes.
Enter “Product” in the Search box.
Click the line for the Product Number RegEx class
Click the Regular Expression radio button and enter “^\D{2}-\d{4}$” as the first line in the DATA PATTERN box. Select “20”in the MATCHING THREASHOLD (%) box. Click SAVE.