Adding a new regular expression-based semantic type
You can create a semantic type based on a regular expression in Talend Dictionary Service and add it to the list of recognized data types in Talend Data Preparation
In Talend Data Preparation, not every type of data can currently be matched with one of the predefined semantic types. Italian social security numbers, also known as codice fiscale, are currently not recognized for example.
Let's say that you work for an Italian company, only dealing with Italian customers. In this example, you need to clean some customer data, such as their names, email address, or their social security number. The semantic type for the column containing the social security number data will be set by default to text. This is not specific enough and you would like to create a new category in order to match this type of data: a codice fiscale semantic type in this case.
You will create this new semantic type in Talend Dictionary Service, and it will be automatically available in Talend Data Preparation so that your data can be matched with a proper type.
Procedure
Results
Your data is now matched with the codice_fiscale semantic type, that you manually created in Talend Dictionary Service. From now on, when importing new datasets containing Italian social security numbers, they will automatically be matched with the proper type.