Semantic Definition Lookup
In this use case, one has found a data element (a column in a table in a database for example, or a field in a report) and wants to understand what it means. By defining the semantic links properly, Talend Data Catalog can trace back through the physical data flow (as long as there is no transformation which would change the meaning) to an element that is mapped to a term in the glossary and thus find a useful definition.
The caveat that the above only works “as long as there is no transformation which would change the meaning” implies that some subset of the fields in your reports will not provide a semantic definition. The trace will simply stop at the transformation and never get to a model (again likely the data warehouse) that has semantic lineage.
So, in addition to this method of “trace through the dataflow as long as there is no transformation which would change the meaning”, there is another which is search based or name matching based. In this case, if there is a field in a report named “Net Account Amount” and it does not have a good data flow trace without transformation, one could still create a term in the glossary named “Net Account Amount”. When requesting a data element definition lookup in that case, Talend Data Catalog will perform a search for that term and report its definition, even without a clean lineage trace. In most case, it will be necessary to fill in the blanks in some of these cases by adding terms to the glossary.
Of course, it is quite possible that no term directly matches the report field by name. In this case, one may define a direct object relationship like a term Is Defined by relationship from a term in the glossary to the field in the report. The advantage of this approach is that one may control precisely what the preferred definition will be versus the name matching method. Also, it provides a definition, even though there may not be a data flow trace that does not contain transformations. Hence, it is the preferred method for fields for which there is no equivalent in the warehouse or lake (i.e., calculated in the report) and there is no term or multiple terms that match by name.
All these types of semantic definitions can be turned on/off in the customized presentation UI semantic usage widget, meaning the users can select what kind of semantic definition they want to see on the Overview page when you have customized it to show the widget, but not what is used for Documentation (Name and Business Definition).
To summarize, there are several methods used to provide an answer to a definition lookup.
The preference for which result is used is based upon a ranking system that is in descending order in the list above. Thus, a DOCUMENTED result gets preference over CLASSIFIED, etc, for the Name and Business Definition.
Example
Navigate to AccountAmountAvailable, which is a column in the GLAccount table in the Dimensional DW in the demo.
The Documentation including Name and Business Definition for the view column is already populated. It was determined based upon a term definition.
Click the pencil icon next to the Definition.
Here you have the Edit Documentation dialog, showing that:
- There is no Local Documentation defined, thus nothing provided to name and define this object that is applied directly to the object
- There is one semantic mapping (Mapped Documentation) defined directly to this object from a glossary term which is used to provide a name and definition for the object
- In addition, pass through lineage and semantic mappings lead to at least one terms which has the same meaning as this object (Inferred Documentation) and has an alternative definition and name.
Click the OPEN SEMANTIC FLOW link to the right of the Inferred Documentation.
Then expand the Enterprise Glossary and Finance Glossary entries to see the terms inside.
Here we see that the definitions are explained well.
- Again, the first name and definition provided before is based upon the term named Account Amount Available, which is directly semantically mapped to the object in question.
- Then tracing back up the semantic mapping to a more generalized term named Amount Funded, the alternate definition is provided.
- Finally, there is a domain type term named Unified Dollar Amount which describes how such an object should be represented.
As you click objects in the diagram, the Process (bottom) Panel shows the processes (semantic mappings) which lead to (make) and are derived from (use) the selected object. E.g., click the Account Amount Available term: