Extracting parts of a field based on semantic types
About this task
The function allows you to select up to five different semantic types that correspond to the type of information you want to extract from a given field. It works with semantic types based on regular expressions or dictionaries, as well as compound semantic types.
For this example, imagine that you are working for the Ministry of Culture, and you need to prepare data based on a survey issued to museum visitors. This survey was able for example to gather some basic demographic information on the visitors, such as their age or gender, but also some comments, that they could enter in a specific field. This comments field could be used by the visitors to share their experience, leave other contact information, or even recommend other museums from other countries they might have visited. This information could be used to build future partnerships for example.
However, after a simple parsing operation, the various information that were gathered in the comments field all ended up in a single field in the resulting dataset. You on the other hand, would like to extract the different types on information to sort them into specific columns. To accomplish that, you will make use of the Extract values by semantic type function, as well as the predefined or custom semantic types available with Talend Cloud Data Preparation, to identify the different categories of information left in the comments, and extract them to individual columns.