Managing semantic types
The semantic type of a field corresponds to the data type that can be found in it, such as names, ZIP codes, or phone numbers.
Semantic types help you enhance the data quality of your datasets. The more your data are valid against the assigned semantic type, the better the quality is.
- A Qlik Talend Cloud Enterprise subscription.
- The Steward role. Make sure the role is turned on. For more information, see Assigning security roles and custom roles to everyone in the tenant.
A semantic type is automatically assigned to each field of your datasets. Some semantic types are predefined and available by default. For more information, see Predefined semantic types.
If the predefined types do not fulfill your requirements, you can create your own semantic types:
- Dictionary: The data must be conform to the values from the dictionary. For more information, see Creating a dictionary-based semantic type.
- Pattern: The data must be conform to a regular expression. For more information, see Creating a pattern-based semantic type.
- Compound: The data must be conform to the values from different semantic types that form the compound semantic type. For more information, see Creating a compound semantic type.
To enhance the data quality of your datasets, the semantic types can be included in the data quality computation. For more information, see Using the semantic types for validation.
Semantic types discovery
The data discovery calculates how many values in a field match each semantic type and, if the result is greater than 40%, it assigns the semantic type to the field.
The score is displayed in the right panel when assigning a semantic type. For more information, see Assigning another semantic type.
How is the percentage calculated?
This percentage is the sum of two percentages:
-
One percentage represents the number of values matching the semantic type; up to 100% allocated. To determine if a value matches a semantic type, the data discovery depends on the type of the semantic type:
-
Dictionary: Does the value match a value from the dictionary? Punctuation, case, spaces, and accents are ignored.
-
Pattern: Does the value match the regular expression? The case is ignored but the accents are not.
-
Compound: is the value discovered into at least one child?
A compound type is a group of existing semantic types, called children.
If the answer is positive, the value is considered valid.
-
-
The other percentage represents the similarity between the field name and the name of the semantic type; up to 10% allocated.
To compare the names:
-
The Levenshtein algorithm is used. It calculates the minimum number of edits (insertion, deletion, or substitution) required to transform one string into another.
-
The case and accents are ignored.
-
If the strings contain spaces, the word order is ignored. For example, US Phone and Phone US are considered identical.
The maximum percentage is 100%. If all values match a semantic type and the field name is identical to the name of the semantic type, the result still is 100%.
-
Data types discovery
Instead of semantic types, native data types can also be assigned. If no semantic type obtains more than 40%, the data discovery automatically assigns a data type.
To determine of which type is a value, the data discovery follows an order:
-
Is the value empty?
-
Is the value of type Boolean? true and false are the only values considered of type Boolean.
-
Is the value of type integer?
-
Is the value of type decimal?
-
Is the value of type date?
-
If the value is not of one of the above types, it is considered a text value.
As the verification is incremental, a value is only of one type. For example, the value 5 is of type integer. It will not be considered of type text.
Assessing the values
You can see the percentage of invalid, empty, and valid values in the quality bar of each field. The percentage is calculated on all data of the field, not on the sample only.
- Open the dataset.
- Select the Data Preview tab.
- To display the percentage, hover over a color in the quality bar.
- You can see up to three colors from left to right:
- Red: Invalid values.
- Black: Empty values.
- Green: Valid values.
- For more details about each color, click it. The right panel opens and you can see the semantic type and the percentage for the validation rules.
The invalid values are marked with a red bar on the left.

Activating and deactivating a semantic type
When you do not want a semantic type to be assigned to any field, you can deactivate it. Some semantic types are deactivated by default.
You can activate and deactivate semantic types at any time.
- Open Data quality. You are in the Semantic types tab.
- From the list, click the semantic type or
.
- Click Edit. The Edit semantic type window is open.
- Either:
- Click Activate at the top of the window.
- Click Deactivate in the bottom-left corner of the window.
The semantic type is deactivated and marked as such in the Semantic types tab.
If the semantic type is used in datasets, a message is displayed in the field.
To assign another semantic type, see Assigning another semantic type.
Assigning another semantic type
The semantic types are automatically assigned to fields but you can assign another one.
- Open the dataset.
- Select the Data Preview tab.
- Click the field you want to update.
- In the right panel, click the Edit icon of the Data type section.
- Search for the semantic type you want to assign to the field and click it.
To help you choose the best semantic type for the field, each semantic type has a score in percentage. It represents the values of the sample that are valid against the semantic type. If you have the update permission, you can update the scores by clicking Refresh in the right panel.
To create a semantic type and automatically assign it to the field, click Create from field. For more information, see Creating a semantic type from a field.
- Click Refresh above the right panel to apply your changes.
Creating a dictionary-based semantic type
You can create a semantic type based on a closed dictionary. You define each value to which your data must correspond.
- Open Data quality. You are in the Semantic types tab.
- If you have no semantic types, click Add. Otherwise, click Create semantic type.
- Enter a name.
- Enter a description. This is optional but recommended to describe the purpose of the semantic type.
- If you want the semantic types to be included in the quality computation, turn on the toggle Use for validation.
- Select Dictionary.
- Enter the values or import a text file by clicking
.
- To automatically open the semantic type when it has been created, select the Open semantic type check box.
- Click Create. The semantic type is created.
- To assign the new semantic type to fields that match it, go to Catalog, open a dataset and click Compute or Refresh.
Creating a pattern-based semantic type
You can create a semantic type based on a regular expression.
- Open Data quality. You are in the Semantic types tab.
- If you have no semantic types, click Add. Otherwise, click Create semantic type.
- Enter a name.
- Enter a description. This is optional but recommended to describe the purpose of the semantic type.
- If you want the semantic types to be included in the quality computation, turn on the toggle Use for validation.
- Select Pattern.
- Enter or paste the regular expression. The maximum length is 350 characters.
For security reasons, a few regular expressions cannot be used, especially the backreferences. For more information, see the RE2/J documentation.
- To automatically open the semantic type when it has been created, select the Open semantic type check box.
- Click Create. The semantic type is created.
- To assign the new semantic type to fields that match it, go to Catalog, open a dataset and click Compute or Refresh.
Creating a compound semantic type
You can create a group of semantic types. For example, if the datasets contains customers all around the world, you can create a compound semantic type to group all Postal code semantic types.
- Open Data quality. You are in the Semantic types tab.
- If you have no semantic types, click Add. Otherwise, click Create semantic type.
- Enter a name.
- Enter a description. This is optional but recommended to describe the purpose of the semantic type.
- If you want the semantic types to be included in the quality computation, turn on the toggle Use for validation.
- Select Compound.
- Select the semantic types to be added from the drop-down list.
When Use for validation is turned on, you can only select semantic types that are also used for validation. If you cannot find a semantic type in the list, make sure that Use for validation is turned on.
- To automatically open the semantic type when it has been created, select the Open semantic type check box.
- Click Create. The semantic type is created.
- To assign the new semantic type to fields that match it, go to Catalog, open a dataset and click Compute or Refresh.
Creating a semantic type from a field
You can a create semantic type from a dataset. You do not need to go to the Semantic types tab.
- Open the dataset.
- Select the Data Preview tab.
- Click the field you want to update.
- In the right panel, click the Edit icon of the Data type section.
- Click Create from field. The Create semantic type window opens. The type Dictionary is already selected.
- Enter the values or import a text file by clicking
.
- To automatically open the semantic type when it has been created, select the Open semantic type check box.
- Click Create. The semantic type is created. You are back to the dataset.
- Click Refresh above the right panel to apply your changes.
Editing a semantic type
You can rename a semantic type, add, edit, and remove values.
You cannot change the type of a semantic type. You must create a new one.
- Open Data quality. You are in the Semantic types tab.
- From the list, click the semantic type or
.
- Click Edit. The Edit semantic type window is open.
- Edit as needed.
- Click Save.
When the semantic type is used in datasets, a message is displayed above the dataset to inform you to refresh.
Deleting a semantic type
You can delete a semantic type even if it is assigned to fields in one or more datasets.
- Open Data quality. You are in the Semantic types tab.
- From the list, click the semantic type or
.
- Click Delete.
- Confirm the deletion.
If the semantic type is used in datasets, a message is displayed in the field.
- To assign another semantic type, see Assigning another semantic type.
Using the semantic types for validation
As the semantic types are part of the dataset quality, they are part of the data quality computation but you can turn off this functionality.
- In Data quality, create or update a semantic type.
- In the configuration window, turn off the toggle Use for validation.
- Click Save.
Turning on or off this toggle does not impact the usage of the semantic type. When a semantic type is assigned to a field, the values of the field are always validated against the semantic type.
For example, 89% of the values of a field are valid againt a semantic type:
- If the toggle is turned off, the dataset quality is not impacted by this good percentage.
- If the toggle is turned on, the dataset quality is better.
Predefined semantic types
The predefined semantic types are split into two categories:
- Data types: The standard types (text, integer, etc.).
- Semantic types: Groups of defined values.
The tables below list and describe each data type and semantic type.
Data type | Description |
---|---|
Text | String text |
Integer | Numeric value |
Decimal | Decimal numeric value |
Date | Date including day, month, and year |
Time | Time of the day |
Timestamps | Date and time |
Boolean | Answers with the value True or False |
Semantic type | Description | Origin of data |
---|---|---|
Academic and professional suffix | Academic and professional suffix | Qlik |
Academic and professional title | Academic and professional title | Qlik |
Address Line | Street number and name | Qlik |
Airport | Airport | Qlik |
Amex Card | American Express card | Qlik |
Animal | Animal | Qlik |
Answer | Answers with the value |
Qlik |
AT VAT Number | Austrian VAT number | Qlik |
Bank Routing Transit Number | Bank routing transit number | Qlik |
BE Postal Code | Belgian postal code | Qlik |
Beverage | Type of beverage | YAGO |
BG VAT Number | Bulgarian VAT number | Qlik |
CA Province Territory | Canadian province | Statoids |
CA Province Territory Code | Canadian province code | Statoids |
Christian religious title | Christian religious title | Qlik |
City | City name | Qlik |
Civility | Civility | Qlik |
Color Hex Code | Color hexadecimal code | Qlik |
Company | Company name | YAGO |
Continent | Continent name | Qlik |
Continent Code | Continent code | Qlik |
Country | Country name | ISO |
Country Code ISO2 | 2-letter country code | ISO |
Country Code ISO3 | 3-letter country code | ISO |
Currency Code | Currency code | ISO |
Currency Name | Currency name | ISO |
Data URL | URL starting with the word |
Qlik |
DE Bundesländer | German federal states | Qlik |
DE Postal Code | German postal code | Qlik |
DE Phone | German phone number | Qlik |
Email address | Qlik | |
EN Month | Month in English | Qlik |
EN Month Abbrev | English month abbreviation | Qlik |
EN Weekday | Week day or their abbreviation | Qlik |
First Name | First name | Qlik |
File URL | File URL | Qlik |
Formal title | Formal title | Qlik |
FR Commune | French municipality | Insee |
FR Departement | French department | Insee |
FR Insee Code | French Insee code of cities with Corsica and colonies | Insee |
FR Postal Code | French postal code | Qlik |
FR Phone | French phone number | Qlik |
FR Region | French region | Insee |
FR Region Legacy | Former French regions, prior to the 2016 territorial reform. | Insee |
FR Social Security Number | French social security number | Qlik |
FR VAT Number | French VAT number | Qlik |
Gender | Gender | Qlik |
Generic URL | Generic URL (Web, HDFS, MailTo, data, file) | Qlik |
Geographic Coordinate | Geographic coordinate, longitude, or latitude coordinates with at least meter precision | Qlik |
Geographic Coordinates | Geographic coordinates, Google Maps style GPS Decimal format | Qlik |
Geographic Coordinates (degree) | Geographic coordinates (degrees), Latitude, and longitude coordinates separated by a comma in the form: N 0:59:59.99,E 0:59:59.99 | Qlik |
HDFS URL | HDFS URL | Qlik |
HR Department | HR department | Qlik |
IBAN | International Bank Account Number | Qlik |
Industry | Industry name | Qlik |
Industry Group | Industry group | Qlik |
Inherited suffix | Inherited suffix | Qlik |
IPv4 Address | IPv4 address | Qlik |
IPv6 Address | IPv6 address | Qlik |
ISBN-10 | International standard book number 10 digits | Qlik |
ISBN-13 | International standard book number 13 digits | Qlik |
Islamic religious title | Islamic religious title | Qlik |
Job Title | Job title | Qlik |
Judaic religious title | Judaic religious title | Qlik |
Language | Language | Wikipedia |
Language Code ISO2 | 2-letter language code | Wikipedia |
Language Code ISO3 | 3-letter language code | Wikipedia |
Last Name | Last name | United States Census Bureau |
MAC Address | MAC address | Qlik |
MailTo URL | MailTo URL | Qlik |
MasterCard | Mastercard credit card | Qlik |
Measure Unit | Measure unit | Qlik |
Military suffix | Military suffix | Qlik |
Money Amount (EN) | Amount of money in English format | Qlik |
Money Amount (FR) | Amount of money in French format | Qlik |
Month | Month | Qlik |
Museum | Museum name | YAGO |
MX Estado | Mexican state | Statoids |
MX Estado Code | Mexican state code | Statoids |
Name Suffix | Name suffix (Inherited, academic, professional, or military) | Qlik |
North American state | US state and Canadian province | Qlik |
North American state code | US state code and Canadian province code | Qlik |
Organization | Organization | YAGO |
Passport | Passport number | Qlik |
Phone number | Phone number (DE, FR, UK, US) | Qlik |
Religious title | Religious title (Christian, Islamic, Judaic) | Qlik |
SE Social Security Number | Swedish person number | Qlik |
Sector | Sector | Qlik |
SEDOL | Stock exchange daily official list | Qlik |
Street Type | Street type | Qlik |
Title | Title (Civility, academic, professional, formal, religious) | Qlik |
UK Phone | UK phone number | Qlik |
UK Postal Code | UK postal code | Qlik |
UK Social Security Number | National identification number, national identity number, or national insurance number generally called NI number | Qlik |
US Phone | US phone number | Qlik |
US Postal Code | US postal code | Qlik |
US Social Security Number | US social security number | Qlik |
URL | Web site URL | Qlik |
US County | US county name | Wikipedia |
US State | US states | Qlik |
US State Code | US state code | Qlik |
Visa Card | Visa credit card | Qlik |
Web Domain | Web site domain | Qlik |
Weekday | Day of the week | Qlik |