Ontologies used in Talend Studio
Using the ontology repository stored on the Elasticsearch server with Talend Studio enables knowledge sharing by re-using indicators and patterns that are already analyzed and seen to best suit the type of data you analyze.
Talend Studio analyzes column content based on a set of methods (regex, data dictionary and keyword dictionary) and then decides what category does the data fall in. For example, for data like:
- user@talend.com, Talend Studio analyzes it against a regex and find it to be an EMAILADDRESS,
- John, Talend Studio analyzes it against the data dictionary and find it to be FIRSTNAME,
- 43 Chester Road, Talend Studio analyzes the tokens in the data string against keywords in the dictionary and find Road to be an ADDRESSLINE.
An ontology has been built on the Elasticsearch server by merging different business standards, UBL, and OAGI:
- Universal Business Language (UBL): An OASIS effort to create a synthesis of existing XML business document libraries into one universal business language.
- Open Application Group (OAGI): OAGI defines a common content model and common messages for communication between business applications.
The final outcome of the merge is 412 concepts that apply on several domains including: customer, company, geography, product, finance, etc.