Classic Diagram
The Classic Data Lineage Diagram can be overly crowded in today's data lake architectures where it is common to find tables/files with over hundred columns/fields. Furthermore, the large number of tables/files involved may generate too many objects in a readable graph, giving rise to possible warning in the user interface.
Please refer to the diagram visualization common features.
In addition to those general features, additionally there features specific to the classic diagram presentation.
Data Flow Classic Diagram
This method of analysis presents a graphical representation of the flow of data through connection definitions to data stores and physical transformation rules which transform and move the data. To see data flow lineage, one must
- Define a configuration that contains all of the models potentially in the data flow
- Stitch the models together by resolving connection definitions and Build the configuration
Once the configuration is ready, then you are ready to report on lineage.
End-to-end data flow lineage across models is only available at the classifier (e.g., table) and feature (e.g., column) level. If instead, one goes to the object page for a schema or model, , as this is not classifier or feature, the data flow tab shows the overview lineage within the scope of that model only.
This is an older methodology for presenting a lineage trace. You are highly encouraged to us the newer method as the Classic diagram does not scale well with larger diagrams and number of objects.
You may disable this feature in the UI by setting the Show Lineage Classic Diagram in group preference to false for the group Everyone.
Data Lineage (sources)
These are the analysis type use cases, generally posed as questions such as:
- Given an item on a report, what data entry system fields impact these results?
- Why are the numbers on this report the way that they are?
- How to change the system data to get the correct results for this report?
This type of analysis, i.e., asking where the information comes from, is a question posed “upstream” in the dataflow. We refer to it as a reverse lineage question. When consumers of these reports ask these questions, a correct and responsive answer may be the most valuable information provided by a metadata management environment.
Steps
- Trace data flow lineage.
- Click the Diagram tab on the left.
- From here you may
- Pick the Type in the pull-down in the upper right.
- Data Impact type
- Data Lineage type
- Full Data Lineage type for both data impact and lineage.
- Click the More Options icon and
- select Show/Hide Columns to show columns in the selected object, or all objects if none is selected
- select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.
- Click Save an image to produce a downloadable file with a lineage image
- Click Edit Filters and specify lineage filter options.
- Click Display Options and specify lineage display options.
- Pick the Type in the pull-down in the upper right.
Example
Search for the Net Vendor CustomerInvoices Tableau worksheet and open it.
Go to the Data Flow tab.
This is a business intelligence report and thus is at the end of the lineage, so Talend Data Catalog automatically chooses Data Lineage for lineage Type.
The End Objects tab on the left is selected in this case, so we see the textual tree-based report.
Click Collapse all to reduce the tree to the top five elements in the lineage.
Now, click the Diagram tab on the left. Click the Collapse Selected node completely () icon.
The different lineage indicate different types of data flow processes
Click the plus sign next to MITI-Finance-AP.dbo (Database) in Accounting (Model).
Click the plus sign next to Invoice (Table) in MITI-Finance-AP.dbo (Database).
You then see the exact column that is a source in the lineage trace.
Click in an empty space in the diagram to de-select Invoice, then select the To Column level expansion, which will now apply to all objects.
Select a column, then click Highlight to outline the paths through that object.
Click the black line between Adjustments.Adj.TransAmt and Staging DW.dbo.GLAccount. AccountAmountAvailable.
And you see the transformation at the bottom of the page.
You may also simply pass the pointer over a link and see summary information.
Data Impact
Many times, one may ask these forward lineage or impact analysis type of questions:
- If I make a change to this field, what reports will be impacted?
- How is this identity information merged with the personnel system information on these other reports?
A data flow impact report traces the manner in which data flows from source to destination.
Steps
- Trace data flow lineage.
- Click Data Impact in the Type pull-down in the upper right.
- From here you may
- Pick the Type in the pull-down in the upper right.
- Data Impact type
- Data Lineage type
- Full Data Lineage type for both data impact and lineage.
- Click the More Options icon and
- select Show/Hide Columns to show columns in the selected object, or all objects if none is selected
- select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.
- Click Save an image to produce a downloadable file with a lineage image
- Click Edit Filters and specify lineage filter options.
- Click Display Options and specify lineage display options.
- Pick the Type in the pull-down in the upper right.
Example
Navigate to the object page for the file PAYTRANS.csv (a search string must be enclosed in quotation marks as the period (.) has special meaning in the search syntax, e.g. "PAYTRANS.csv") and the semantic search must be disabled.
Then click the Data Flow tab and Diagram tab on the left. Note, the Impact type is automatically selected, as the PAYTRANS.csv file is an ultimate source in the configuration, so it does not have any source lineage.
Full Data Lineage
This option provides the combination of both:
- Data Lineage (trace from an object upstream to objects that provide data flow to that object)
- Data Impact (trace from an object downstream to objects that are impacted via data flow by that object)
Based upon all the lineage flows that trace though the selected object (feature or classifier).
Steps
- Trace data flow lineage.
- Click Full Data Lineage in the Type pull-down in the upper right.
- From here you may
- Pick the Type in the pull-down in the upper right.
- Data Impact type
- Data Lineage type
- Full Data Lineage type for both data impact and lineage.
- Click the More Options icon and
- select Show/Hide Columns to show columns in the selected object, or all objects if none is selected
- select Expand/Collapse All to expand down to the current display level (columns or tables) or collapse to the highest level.
- Click Save an image to produce a downloadable file with a lineage image
- Click Edit Filters and specify lineage filter options.
- Click Display Options and specify lineage display options.
- Pick the Type in the pull-down in the upper right.
Example
The Full Data Lineage option is the default. However, as it may take more time to render, you may disable it in the Group Preferences.
If disable, you may enable it. Sign in as Administrator. Go to MANAGE > Groups. Select the group named Everyone. Go to the Preferences tab and click Add and specify the Enable Full Data Lineage preference.
Click OK. Set the Value to true and click SAVE.
Search for “Customer” and pick the table Dimensional DW > dbo > Customer.
Go to the Data Flow tab.
The Data flow tab has double arrows next to it, indicating that there are both impact and lineage traces for this object.
Select Full Data Lineage.
You have all the lineage traces going through that object. The object from which the lineage is determined is marked with a red pin.
Highlight Path
Click highlight path to highlight the lineage path of the selected object. Double click or long click to enable auto highlight on any selected object.
Classic Diagram Display Options
The Display Options are available.
- MAXIMUM NODE WIDTH – set the size of the object boxes.
- Highlight Control Links - Checking this option means that anytime you are highlighting a trace, the control links will be included in the highlighting.