Semantic Definition Lookup
In this use case, one has found a data element (a column in a table in a database for example, or a field in a report) and wants to understand what it means. By defining the semantic links properly, Talend Data Catalog can trace back through the physical data flow (as long as there is no transformation which would change the meaning) to an element that is mapped to a term in the glossary and thus find a useful definition.
The caveat that the above only works “as long as there is no transformation which would change the meaning” implies that some subset of the fields in your reports will not provide a semantic definition. The trace will simply stop at the transformation and never get to a model (again likely the data warehouse) that has semantic lineage.
So, in addition to this method of “trace through the dataflow as long as there is no transformation which would change the meaning”, there is another which is search based or name matching based. In this case, if there is a field in a report named “Net Account Amount” and it does not have a good data flow trace without transformation, one could still create a term in the glossary named “Net Account Amount”. When requesting a data element definition lookup in that case, Talend Data Catalog will perform a search for that term and report its definition, even without a clean lineage trace. In most case, it will be necessary to fill in the blanks in some of these cases by adding terms to the glossary.
Of course, it is quite possible that no term directly matches the report field by name. In this case, one may define a direct object relationship like a term Is Defined by relationship from a term in the glossary to the field in the report. The advantage of this approach is that one may control precisely what the preferred definition will be versus the name matching method. Also, it provides a definition, even though there may not be a data flow trace that does not contain transformations. Hence, it is the preferred method for fields for which there is no equivalent in the warehouse or lake (i.e., calculated in the report) and there is no term or multiple terms that match by name.
All these types of semantic definitions can be turned on/off in the customized presentation UI semantic usage widget, meaning the users can select what kind of semantic definition they want to see on the Overview page when you have customized it to show the widget, but not what is used for Documentation (Name and Business Definition).
To summarize, there are several methods used to provide an answer to a definition lookup.
The preference for which result is used is based upon a ranking system that is in descending order in the list above. Thus, a DOCUMENTED result gets preference over CLASSIFIED, etc, for the Name and Business Definition.
Example
Navigate to AccountAmountAvailable, which is a column in the GLAccount table in the Dimensional DW in the demo.
The Documentation including Name and Business Definition for the view column is already populated. It was determined based upon a Term definition.
Click on the Semantic Flow tab, go to the List tab on the left, and you see the inferred semantic definition from a term in the Glossary:
Then click the Diagram tab on the left and you see the actual semantic lineage trace that got to the term.
Click Columns and select the specific column to show in the trace:
The actual result used for the Name and Business Definition is the Is Defined By associated term. This is based upon a ranking system that is in descending order in the list of types. Thus, a Documented result gets preference over Term Defined, etc., for the Name and Business Definition.
You may do the same with classifier (e.g., tables, flat files, views, etc.), not simply with feature type objects (e.g., columns, fields, attributes, etc.) when there is a known replication/bulk type data mapping that defines the lineage from one table to the next. In this case, semantic meaning is know to be ide3ntical and thus the Business Name and Definition are inferred through the lineage at the classifier (table) level..
E.g., the data mapping Dimensional DW to Vendor Mart is a replication mapping, simulating the data flow (bulk replication) of identical tables in the Dimensional DW database to the Vendor Mart EAI Then, table in the Vendor Mart infer the Business Name and Definition from the tables up stream in the Dimensional DW database.