Ontologies used in Talend Studio
What is an ontology?
An ontology is a description of the concepts, attributes, and the relationships that can exist for data in multiple columns. For example, a customer column is the concept, and date of birth and name are the attributes of the concept. An ontology lists concepts, attributes, and synonyms of the attributes.
What an ontology is used for in Talend Studio?
Using the ontology repository stored on the Elasticsearch server with Talend Studio enables knowledge sharing by re-using indicators and patterns that are already analyzed and seen to best suit the type of data you analyze.
Talend Studio analyzes column content based on a set of methods (regex, data dictionary and keyword dictionary) and then decides what category does the data fall in. For example, for data like:
- user@talend.com, Talend Studio analyzes it against a regex and find it to be an EMAILADDRESS,
- John, Talend Studio analyzes it against the data dictionary and find it to be FIRSTNAME,
- 43 Chester Road, Talend Studio analyzes the tokens in the data string against keywords in the dictionary and find Road to be an ADDRESSLINE.
What ontologies are used in Talend Studio?
An ontology has been built on the Elasticsearch server by merging different business standards, UBL, and OAGI:
- Universal Business Language (UBL): An OASIS effort to create a synthesis of existing XML business document libraries into one universal business language.
- Open Application Group (OAGI): OAGI defines a common content model and common messages for communication between business applications.
The final outcome of the merge is 412 concepts that apply on several domains including: customer, company, geography, product, finance, etc.