Lineage
Once well managed, metadata is then open for detailed analysis, and true business level use cases may be solved. Talend Data Catalog supports full business level lineage and impact analysis down to the classifier (table/entity/dimension) and feature (column/attribute/measure) level.
Generally, there are two types of lineage: Data Flow and Semantic Flow. Talend Data Catalog can allow users to display and analyze both types of lineage.
- You may invoke a lineage and/or impact trace by going to the Lineage tab or context menu from a classifier (table, file, entity, etc.) or feature (column, field, attribute, etc.) and specifying the Type in the upper left of the lineage display to be DATA FLOW which will present an end-to-end trace across all the models and mappings in your current configuration
- You may invoke a lineage overview by going to the Lineage tab from the details page for a model, schema, ETL job, BI design, etc., and going to the Lineage tab.
Either use case may be displayed from the model / data store / schema high level perspective of the enterprise architecture, down to the table / file level, and finally all the way down at the column / field level. The level can be selected for the entire data lineage diagram, or individually on selected data store models / schemas, or selected tables / files.
In the Data Lineage Diagram, all columns/fields of a given table/file are presented at once which matches the classic data modeling concepts. Selection of a given column/field allows a user to highlight the data flow to it.
However, in the past, these diagrams can be overly crowded in today's data lake architectures where it is common to find tables/files with over hundred columns/fields. Furthermore, the large number of tables/files involved may generate too many objects in a readable graph, giving rise to possible warning in the user interface.
You now have the option (by default) of using the data flow "interactive" Analysis Diagram, which displays the columns/fields involved in the given data flow trace, not all the columns. The user can then select the columns/fields to be displayed to better present the business use case of that data flow. Then the user can interact within that diagram by selecting columns/fields to display its lineage. Furthermore, the Analysis Diagrams allow you to display conditional labels such as PII or Confidential SensitivityLevel, not only providing more critical information to the user, but also better visualization of the propagation of that information (e.g. PII) through the data flow lineage trace.
The Type in the upper left of the lineage display provides a selection between eitherDATA FLOW Based upon connection definitions to data stores and physical transformation rules which transform and move the data) or SEMANTIC FLOW (based upon the definition and usage type relationships from a term, concept or logical Model to a physical representation. Both data flow and semantic flow may be present in a diagram..
Data flow links are represented as solid black or gray lines and semantic lineage as dashed blue lines. In most cases, diagrams are laid out so that data flow is shown as left (source) to right (destination) and semantic as top to bottom (more abstract or defining to usage).