List of the indexes and regex categories used in the Semantic-aware analysis
The Semantic-aware approach analyzes column content based on a set of methods: regex, data dictionary and keyword dictionary.
The dictionary indexes and regex categories are embedded in Talend Studio and used
in the Semantic-aware analysis to:
- Help explore semantic categories of data.
- Decide what category the data falls in.
Information noteRestriction: When you do not use the latest version of Talend Studio, some
of the listed regex categories and data dictionary indexes might not be
available.
Regex categories
| Regex categories | Description | Origin of data |
|---|---|---|
| Amex Card | American Express card | Talend |
| AT VAT Number | Austrian VAT number | Talend |
| Bank Routing Transit Number | Bank routing transit number | Talend |
| BE Postal Code | Belgian postal code | Talend |
| BG VAT Number | Bulgarian VAT number | Talend |
| Color Hex Code | Color hexadecimal code | Talend |
| Data URL | URL starting with the word data | Talend |
| DE Phone | German phone number | Talend |
| DE Postal Code | German postal code | Talend |
| EN Month | Month in English | Talend |
| EN Month Abbrev | English month abbreviation | Talend |
| EN Weekday | Week day or their abbreviation | Talend |
| Email address | Talend | |
| File URL | File URL | Talend |
| FR Insee Code | French Insee code of cities with Corsica and colonies | Insee |
| FR Phone | French phone number | Talend |
| FR Postal Code | French postal code | Talend |
| FR Social Security Number | French social security number | Talend |
| FR VAT Number | French VAT number | Talend |
| Geographic Coordinate | Geographic coordinate, longitude, and latitude coordinates with at least meter precision | Talend |
| Geographic Coordinates | Geographic coordinates, Google Maps style GPS Decimal format | Talend |
| Geographic Coordinates (degree) | Geographic coordinates (degrees), Latitude, and longitude coordinates separated by a comma in the form: N 0:59:59.99,E 0:59:59.99 | Talend |
| HDFS URL | HDFS URL | Talend |
| IBAN | International Bank Account Number | Talend |
| IPv4 Address | IPv4 address | Talend |
| IPv6 Address | IPv6 address | Talend |
| ISBN-10 | International standard book number 10 digits | Talend |
| ISBN-13 | International standard book number 13 digits | Talend |
| MAC Address | MAC address | Talend |
| MailTo URL | MailTo URL | Talend |
| MasterCard | Mastercard credit card | Talend |
| Money Amount (EN) | Amount of money in English format | Talend |
| Money Amount (FR) | Amount of money in French format | Talend |
| Passport | Passport number | Talend |
| SE Social Security Number | Swedish person number | Talend |
| SEDOL | Stock exchange daily official list | Talend |
| UK Phone | UK phone number | Talend |
| UK Postal Code | UK postal code | Talend |
| UK Social Security Number | National identification number, national identity number, or national insurance number generally called NI number | Talend |
| URL | Web site URL | Talend |
| US Phone | US phone number | Talend |
| US Postal Code | US postal code | Talend |
| US Social Security Number | US social security number | Talend |
| US State | US states | Talend |
| US State Code | US state code | Talend |
| Visa Card | Visa credit card | Talend |
| Web Domain | Web site domain | Talend |
Data dictionary indexes
| Data dictionary indexes | Description | Origin of data |
|---|---|---|
| Airport | Airport | Talend |
| Airport Code | Airport code | Talend |
| Animal | Animal | Talend |
| Answer | Answers with the value True or False | Talend |
| Beverage | Type of beverage | YAGO |
| CA Province Territory | Canadian province | Statoids |
| CA Province Territory Code | Canadian province code | Statoids |
| City | City name | Talend |
| Civility | Civility | Talend |
| Company | Company name | YAGO |
| Continent | Continent name | Talend |
| Continent Code | Continent code | Talend |
| Country | Country name | Open Knowledge (Public Domain Dedication and License) |
| Country Code ISO2 | 2-letter country code | Open Knowledge (Public Domain Dedication and License) |
| Country Code ISO3 | 3-letter country code | Open Knowledge (Public Domain Dedication and License) |
| Currency Code | Currency code | Open Knowledge (Public Domain Dedication and License) |
| Currency Name | Currency name | Open Knowledge (Public Domain Dedication and License) |
| FR Commune | French municipality | Insee |
| FR Departement | French department | Insee |
| FR Region | French region | Insee |
| FR Region Legacy | Former French regions, prior to the 2016 territorial reform. | Insee |
| Gender | Gender | Talend |
| HR Department | HR department | Talend |
| Industry | Industry name | Talend |
| Industry Group | Industry group | Talend |
| Job Title | Job title | Talend |
| Language | Language | Wikipedia |
| Language Code ISO2 | 2-letter language code | Wikipedia |
| Language Code ISO3 | 3-letter language code | Wikipedia |
| Last Name | Last name | United States Census Bureau |
| Measure Unit | Measure unit | Talend |
| Month | Month | Talend |
| Museum | Museum name | YAGO |
| MX Estado | Mexican state | Statoids |
| MX Estado Code | Mexican state code | Statoids |
| Organization | Organization | YAGO |
| Sector | Sector | Talend |
| Street Type | Street type | Talend |
| US County | US county name | Wikipedia |
| US State | US states | Talend |
| US State Code | US state code | Talend |
| Weekday | Day of the week | Talend |
Keyword dictionary indexes
| Keyword dictionary indexes | Description | Origin of data |
|---|---|---|
| Address Line | Street number and name | Talend |
| Full Name | Full name | Talend |