Analyzing lineage for apps, scripts, and datasets
Lineage tracks data and data transformations backwards to the original source. Qlik Cloud provides a detailed visual representation of the history of this flow, where you can interactively examine the upstream lineage of a given field or dataset.
A consumer summary view of measures and dimensions used in a particular chart object identifies the source, giving you confidence that you can understand and trust what you are seeing and working with.
Business users examining a given field have a view of lineage for the field that summarizes its most important dependencies:
- Fields that are used to derive it
- Direct associations and dependencies, including owner and space
- Original source (the first known source)
To view downstream or forward-looking dependencies, you can investigate what elements would be affected by a change to the object by viewing Impact analysis. See Analyzing impact analysis for apps, scripts, and datasets.
For a visual demo of how to use lineage, see:
The lineage graph
The lineage graph shows the flow of data through apps in an interactive, graphical chart. An app, a table, or a field is called a node in a lineage graph. When a node is the base node being investigated, it is said to be in focus and displays as the last element in the graph. At the most granular level, field-level lineage graphs show the data sources and transformations that a node is sourced from or dependent on.
Lineage graphs are useful to:
-
Data experts working with the data
-
Business specialists building apps
-
Advanced business users consuming apps
Each node represents a step in the lineage of the selected dataset or app. This lineage information is compiled whenever an app loads or reloads data. If your app has not be reloaded recently, the lineage may be incomplete or inaccurate.
Lineage is supported for datasets and apps directly from the tile or row as they appear in your Qlik Sense catalog:
-
Dataset: Datasets are data sources, such as data loaded from connectors or data files. Datasets can be tables in a database, data that is uploaded to data storage or data that is generated from an app, such as a qvd file. Datasets usually have a single table each but some, such as Excel files, can have multiple tables.
-
App: App nodes represent Qlik Sense analytic apps that use the data sources in the lineage. App nodes display the app name and location of the app as Sense.
Typical input nodes include data sources that are used by the base node, or apps that produce datasets. Field-level lineage allows for detailed investigation into how fields have been calculated and their specific origin across transforms and applications.
The nodes available in a lineage graph are the inputs to your selected dataset or app. Select a dataset or app to designate it as the base node. Input nodes are nodes that are upstream from the base node.
The nodes available in a lineage graph are the inputs to your selected base node, in other words the node in focus. The base node is the singular node for which you want to retrieve lineage; it is an application, dataset, file, table, or field.
It will be the right-most node on your screen and outlined in blue. It is the focus of your investigation and only inputs to that base node will be presented.
While you explore the lineage, you can interactively change the base node to another table, application, or field on the screen to focus your investigation.
The lines connecting the nodes are edges. Edges represent the relationship of a node to another node. They represent relationships indicating associations such as a dataset that is used by an application. They can also represent data that is produced as a by-product of an application. The collection of nodes and edges together make the lineage graph.
Nodes collapse or expand to reveal hierarchy levels from coarse to finer granularity beginning with the higher-level dataset group or app down to the most granular level which is the field level.
Node details
Details are limited by your access to that object. Details can provide the following information:
-
Name
-
Description
-
Tags
-
Location
-
Space
-
Owner
-
Creator
-
Last modified
Navigating the lineage graph
Click and drag the graph to navigate and center the lineage graph. You can also use the navigation buttons. Select Home to center the lineage graph on the base node. Click back and forward to move around in your selections.
Select Lineage in the context menu on an app or dataset tile or row depending on your view, to open lineage for it. You can also access a lineage graph from the overview of a dataset by selecting and Lineage. You can access lineage (upstream) or impact analysis (downstream) for other nodes that appear in graphs by selecting and Lineage (Use as new base node) or Impact Analysis. Select a node to designate it as the base node.
Expand or collapse the nodes to expand or collapse groups of objects at the same level.
The lineage summary view
Analytics consumers and casual business users can access a consumer summary view of field-level dependencies directly from an analytics chart in a tabular view that lists dependencies for the field used in the chart.
This view is supported in all Qlik non-bundled analytic charts except Button and Map charts. While the more detailed lineage graph is available, casual analytics users can access summary information for associated objects that include: apps, links, datasets, tables, fields, dimensions and measures. You can easily get an explanation of where the data came from within a chart with the lineage summary view.
This view lists details and dependencies for measures and dimensions in the chart. These include the Expression that calculates the measure or dimension, the fields that are being used in the expression, and the data sources where the fields came from. Select on the fields and sources box to Go to lineage(for that field or source).
Do the following:
- Change chart settings for the lineage consumer summary view to be visible. These settings are not on by default.
- Under Appearance>General>turn on Show details.
- Under Appearance>General>turn on Show expressions.
- Select Done editing and select information to access the lineage consumer summary view.
Limitations
The lineage chart has the following limitations:
-
Apps that have not been reloaded after the release of lineage in Qlik Cloud may not have full lineage information available for them until after they reload. Details for some nodes may be limited if they have not been loaded after lineage was turned on for your tenant.
-
Node details for datasets outside of your tenant, such as SQL Server or Google Drive connections, are limited to the type of dataset and name. REST connections only display that it is REST data.
-
The lineage summary view is not supported in Button and Map charts.
-
Currently chart details summary view is not available in mobile mode.
Permissions
You must be able to view an app or dataset to view the lineage for the item from your activity centers. If you can see the lineage graph for a base node, you are able to see basic details and metadata for the upstream lineage objects.
Security
Field-level lineage
-
A user can only change to a base node that they have access to; otherwise the context menu is not available.
-
If a user has access to the base node, they will have access to see all upstream lineage.
Consumer summary view (chart details)
-
Access is always provided to dimensions and expressions. Links to lineage and dataset names display only when the user has access.
Example use cases for analyzing lineage
For a walk-through of the following examples, see Field-level lineage use cases.
Example: Exploring where information comes from with the lineage summary view
As an analytics consumer looking at a bar chart in an app cars-data4-app, you would like to know where the information comes from. You make sure that Show details and Show expressions are turned on for the chart under the General section of properties, then select Done editing. When you click back into the chart, select Show details to display the lineage consumer view.
The lineage consumer view lists Details for the dimensions and measures in the left column and Dependencies for Fields and Sources in the middle and right columns. You see that the dimension Car_Id-ID is dependent on the field Car_Id-ID which is found in three listed QVD sources. Select the menu on the field entry and select Lineage-Car_id-ID / cars-data to open a lineage graph for the field Card_id-ID in the cars-data QVD.
The lineage graph is viewed right-to-left and shows that field Car_id_ID is in the table cars-data; stored as a Qlik Data File. Expand the nodes as you trace the field history back to the original file that was uploaded to Qlik Cloud. You see that the first relay back shows that a QVD cars-data.qvd containing the field Car_id_ID was loaded to the app cars-data4-app. The next node back is an app cars-data3-app from which the cars-data.qvd was generated. Going back one more relay and expanding the node, you see that the original source file was a CSV file cars-data3.csv and it contained the field ID.
By expanding the tables and viewing fields, you are able to identify the original source file, table, and field of the bar chart dimension Car_id-ID.
Example: Investigating the origins of a dataset and how it was created
As an app developer, you are considering using an existing dataset rgb_customers.qvd for your application. You investigate the origins of this dataset so that you can understand where the data comes from. From the dataset tile or the row, select Lineage from the menu to open the lineage graph. From the lineage graph, you view metadata for the dataset by selecting the menu on the QVD rgb_customers.qvd and Open the overview.
Click the browser back arrow to return to the lineage graph to explore the lineage graph for the dataset. Expand the the rgb_customers_qvd node to view available fields. Note that each field provides the option to make it the base node of focus by selecting Lineage (Use as new base node) or select Impact analysis to view forward lineage and dependent objects that will be impacted by changes to the dataset.
Following the lineage backwards and expanding the nodes, you can see that this QVD dataset is the output of the Prep Data RGB Sales Analysis app. Going back another relay and expanding the File storage node, you see that the sales analysis app had a CSV file loaded to it: rgb_customers.csv. Field-level analysis reveals that the Tags field in the original source file was re-named to rgb_customers.Tags in the sales analysis app. The original CSV file can be opened to the overview to reveal valuable metadata such as the owner, creator, usage metrics, tags, classifications, field profile, and impact analysis.
Example: Viewing field dependencies and expressions in charts with lineage summary
As a casual business user, you are analyzing a line chart COGS per Month in an app and want to review the expression used to calculate the measure (Cost of Goods Sold) COGS per Month and which Fields and Sources are used in the chart. You make sure that Show details and Show expressions are turned on for the chart under the General section of properties, then select Done editing. When you click back into the chart, select Show details to display the lineage consumer view.
The lineage summary view lists the measures and the dimensions that are represented in the chart. The expressions that create the measures and dimensions can also be viewed in a dropdown when you expand these fields. You see that the measure is created by the expression: Sum (UnitCost*Quantity). The fields (Quantity, UnitCost) in the expression are listed as field dependencies. You can select to view lineage for the two fields but in this case there is only one relay back to its ultimate source Dataset.xlsx, which is also listed as a source dependency in the lineage summary.