Analyzing lineage in Analytics

Lineage tracks data and data transformations backwards to the original source. Qlik Cloud provides a detailed visual representation of the history of this flow, where you can interactively examine the upstream lineage of analytics content. Lineage is available for content such as apps, scripts, data flows, ML experiments, ML deployments, and datasets.

When consuming an analytics app, access the lineage summary view, a representation of measures and dimensions used in a particular chart object. With this view, you can identify the source, giving you confidence that you can understand and trust what you are seeing and working with. For more information, see The in-app lineage summary view.

To view lineage for analytics content, the underlying data must be stored in Qlik Cloud as a cataloged source.

Business users examining a given field have a view of lineage for the field that summarizes its most important dependencies:

Fields that are used to derive it
Direct associations and dependencies, including owner and space
Original source (the first known source)

To view downstream or forward-looking dependencies, you can investigate what elements would be affected by a change to the object by viewing Impact analysis. See Analyzing impact analysis in Analytics.

For a visual demo of how to use lineage, see:

Using lineage

The lineage graph

The Lineage graph shows the flow of data through analytics content in an interactive, graphical chart. A resource, table, or field is called a node in a lineage graph. When a node is the base node being investigated, it is said to be in focus and displays as the last element in the graph. At the most granular level, field-level lineage graphs show the data sources and transformations that a node is sourced from or dependent on.

Lineage graphs are useful to:

Data experts working with the data
Business specialists building apps
Advanced business users consuming apps
Users who work with machine learning models

Each node represents a step in the lineage of the selected content. This lineage information is compiled whenever an analytics asset refreshes its data. If your app, script, or data flow has not be refreshed recently, the lineage may be incomplete or inaccurate.

Lineage is available for supported content types from the tile or row as they appear in your catalog. You can analyze lineage for the following analytics content:

Dataset: Datasets are data sources, such as data loaded from connectors or data files. Datasets can be tables in a database, data that is uploaded to data storage or data that is generated from an app, such as a qvd file. Datasets usually have a single table each but some, such as Excel files, can have multiple tables.

Datasets that were published from Talend Studio are indicated with . You can go to the Talend Management Console Job that was used to generate the dataset by clicking Open TMC Job in the dataset ... menu. For more information, see Publishing datasets and lineage to Qlik Cloud.
App: App nodes represent Qlik Sense analytic apps that use the data sources in the lineage. App nodes display the app name and location of the app as Qlik Sense.
Script: Script nodes represent scripts created in the Script interface.
Data flow: Data flows can be inspected for better understanding of the data sources that they use and transform.
ML experiment: You can understand the lineage of a machine learning experiment, which consists of the data sources that have come together to produce the training data for machine learning models.
ML deployment: You can understand the lineage of an ML deployment and how it is being used in predictions. Lineage for ML deployments typically consists of ML experiments, experiment versions, models, and datasets.

Typical input nodes include data sources that are used by the base node, or apps that produce datasets. Field-level lineage allows for detailed investigation into how fields have been calculated and their specific origin across transforms and applications.

The nodes available in a lineage graph are the inputs to your selected content. Select an item to designate it as the base node. Input nodes are nodes that are upstream from the base node.

The nodes available in a lineage graph are the inputs to your selected base node, in other words the node in focus. The base node is the singular node for which you want to retrieve lineage; for example, it could be an application, data flow, ML experiment, dataset, file, table, or field.

It will be the right-most node on your screen and outlined in blue. It is the focus of your investigation and only inputs to that base node will be presented.

While you explore the lineage, you can interactively change the base node to another table, application, field, or other item on the screen to focus your investigation.

The node in focus also called the base node will be to the right of your screen — Lineage base node

The lines connecting the nodes are edges. Edges represent the relationship of a node to another node. They represent relationships indicating associations such as a dataset that is used by an application. They can also represent data that is produced as a by-product of an application. The collection of nodes and edges together make the lineage graph.

Edges represent relationships between objects — Lineage edges represent relationships

Nodes collapse or expand to reveal hierarchy levels from coarse to finer granularity beginning with the higher-level dataset group or app down to the most granular level which is the field level.

A node with asset, resource, table, and field levels — In this image of a node, the following hierarchy levels are shown, from highest (coarsest) to lowest (most granular): Data asset (app), resource (dataset), table, and fields.

Opening the lineage graph

Do the following:

Open the Insights or Analytics activity center.
Select Lineage in the context menu on an item that supports lineage.

You can also access the lineage graph of some content when you have an item opened. Click More and Lineage.

Node details

Details are limited by your access to that object. Details can provide the following information:

Name
Description
Tags
Location
Space
Owner
Creator
Last modified

Navigating the lineage graph

Click and drag the graph to navigate and center the lineage graph. You can also use the navigation buttons. Select Home Home to center the lineage graph on the base node. Click back and forward to move around in your selections.

Navigation buttons for the lineage graph. — Lineage graph navigation

The Lineage graph shows the upstream dependencies for your analytics content, which is presented as the default node when you open the graph for it. You can access lineage (upstream) or impact analysis (downstream) for other nodes that appear in the graph by selecting More and Lineage (new base node) or Impact analysis. Select a node to designate it as the base node.

Expand Arrow down or collapse arrow up the nodes to expand or collapse groups of objects at the same level.

Lineage summary view in an app

The lineage summary view in an app can give business users a high-level overview of the upstream dependencies in the app. For more information, see The in-app lineage summary view.

Analyzing lineage for machine learning content

You can use the Lineage graph to analyze the origins of machine learning content, including ML experiments, ML deployments, and datasets. Use the graph for a holistic view of how machine learning models were created, the data they were trained on, and what they are used for in production scenarios.

Experiments, deployments, and datasets also appear as nodes when analyzing other content in the Lineage graph, such as downstream apps.

Machine learning assets are also shown in Impact analysis for comprehensive analysis of downstream content. For more information, see Analyzing impact analysis in Analytics.

Opening Lineage for machine learning content

Do one of the following:

In your activity center, click next to an ML experiment, ML deployment, or dataset, and select Lineage.
In an ML experiment or ML deployment, click in the navigation bar and select Lineage.

Navigating Lineage for machine learning content

You explore machine learning nodes in the same ways as for other content. For interface overviews, see:

The lineage graph
Navigating the lineage graph

Recognizing machine learning items in the Lineage graph

The following table outlines common items related to machine learning that appear in the Lineage graph.

Common lineage items for machine learning
Item	Icon(s)	Explanation
File storage		Not unique to machine learning content. Shows the location where a dataset is stored (in most cases, in a space). Relevant for training dataset, exports from embedded analytics in an experiment, apply datasets used for predictions, and prediction output datasets.
Dataset	Many (for example, for QVD)	Not unique to machine learning content. Used to represent training datasets, exports from embedded analytics in an experiment, apply datasets, and prediction output datasets.
ML experiment		An ML experiment in which models are trained.
ML experiment version		The version within the ML experiment, in which one or more models have been trained.
ML model		An ML model trained within an experiment version. Used to represent trained models in an ML experiment, and deployed models in an ML deployment.
ML deployment		An ML deployment that contains one or more deployed models.
No icon	-	Prediction output nodes within an ML deployment do not have icons. Fields included in a prediction output dataset also do not have icons.

Lineage and ML experiments

ML experiments can appear in the following ways:

As the base node of a lineage graph.
As upstream nodes of other processes and outputs, such as predictions or predictive apps.

ML experiments are presented in grouped arrangements. They expand as follows:

An ML experiment expands into one or more experiment versions.
An experiment version expands into one or ML models.

When a model trained in an experiment is deployed into an ML deployment, it appears in the lineage graph when downstream content (for example, predictions or ML deployments) is selected as the base node.

Lineage and ML deployments

ML deployments can appear in the following ways:

As the base node of a lineage graph.
As upstream nodes of other processes, such as predictive apps, scripts, or data flows.

ML deployments are presented in grouped arrangements. They expand as follows:

An ML deployment expands into one or more deployed models.
If a model in the deployment has been used in batch predictions, the model expands to show each batch prediction output.

Field-level lineage is available for apply datasets and prediction output datasets that relate to an ML deployment.

Deployed models used for predictions are connected back to the experiment in which they were trained.

Lineage and ML datasets

ML datasets are datasets that are used in or created by ML experiments and ML deployments. They include:

Training datasets
Datasets exported from embedded analytics in an ML experiment (Compare and Analyze tabs)
Apply datasets
Prediction output datasets, including prediction, SHAP, Coordinate SHAP, errors, and apply datasets

Deleted content

If an ML experiment, ML deployment, or dataset used in machine learning processes is deleted, it is still shown in the Lineage graph when analyzing other nodes.

Permissions

For information about permissions, see Permissions.

Example scenario

For an example scenario, see Example: Investigating lineage of machine learning content.

Limitations

The lineage chart has the following limitations:

Apps that have not been reloaded after the release of lineage in Qlik Cloud may not have full lineage information available for them until after they reload. Details for some nodes may be limited if they have not been loaded after lineage was turned on for your tenant.
Node details for datasets outside of your tenant, such as SQL Server or Google Drive connections, are limited to the type of dataset and name. REST connections only display that it is REST data.

Permissions

Permissions for apps, scripts, data flows, and datasets

You must be able to view an app, script, data flow, or dataset to view the lineage for the item from your activity centers. If you can see the lineage graph for a base node, you are able to see basic details and metadata for the upstream lineage objects.

Permissions for ML experiments and ML deployments

Permissions for full access

If you have the following, you can directly open Lineage from the ML experiment or ML deployment, or from your activity center:

Professional or Full User entitlement
Automl Experiment Contributor or Automl Deployment Contributor security role
For ML experiments or ML deployments in shared spaces, one of the following space roles in the shared space:
- Owner (of the space)
- Can manage
- Can edit
- Can view
For ML experiments or ML deployments in managed spaces, one of the following space roles in the managed space:
- Owner (of the space)
- Can manage
- Can contribute
- Can view
- Can operate

With this access level, you also have permissions to view details from the ML experiment or ML deployment.

Permissions for analyzing lineage

If you have the following, you can see the ML experiment or ML deployment in the Lineage graph when other content is set as the base node. You can also set the experiment or deployment as the base node for analysis.

Professional or Full User entitlement
For ML experiments or ML deployments in shared spaces, one of the following space roles in the shared space:
- Owner (of the space)
- Can manage
- Can edit
- Can view
For ML experiments or ML deployments in managed spaces, one of the following space roles in the managed space:
- Owner (of the space)
- Can manage
- Can contribute
- Can view
- Can operate

This access level is more limited than the full access level. If you also have the Automl Experiment Contributor or Automl Deployment Contributor security role, you will have full access and can perform other actions, such as opening them in the Lineage graph directly and viewing details.

Security

A user can only change to a base node that they have access to; otherwise the context menu is not available.
If a user has access to the base node, they will have access to see all upstream lineage.

Example use cases for analyzing lineage

For a walk-through of lineage analysis, see Field-level lineage use cases.

Example: Exploring where information comes from with the lineage summary view

As an analytics consumer looking at a bar chart in an app cars-data4-app, you would like to know where the information comes from. You make sure that Show details and Show expressions are turned on for the chart under the Appearance > General section of properties, then select switch to sheet analysis mode. Right-click the chart, or use the icon more actions menu, and select Show details to show the lineage consumer view. Click Show dependencies.

You see that the dimension Car_ID is dependent on the field Car_ID which is found in three listed CSV sources. Select the icon more actions menu on the field entry and select Lineage - Car_ID / Cars to open a lineage graph for the field Car_ID in the app.

Select the menu and then the option to view lineage for a source in the consumer lineage summary view — Select a source or field to view lineage for that object

The lineage graph is viewed right-to-left and shows that field Car_ID is in the table Cars that was loaded into the app cars-data4-app. Expand Arrow down the nodes as you trace the field history back to the original file that was uploaded to Qlik Cloud. You see that the first relay back shows that a CSV cars-data.csv containing the field Car_ID was loaded to the app cars-data4-app. The next node back is an app cars-data3-app from which the cars-data.csv was generated. Going back one more relay and expanding the node, you see that the original source file was a CSV file cars-data3.csv and it contained the field ID.

By expanding the tables and viewing fields, you are able to identify the original source file, table, and field of the bar chart dimension Car_id-ID.

Start with the node in focus and trace lineage history back to the original source — Expand the nodes to trace history of a field back to the source file

Example: Investigating the origins of a dataset and how it was created

As an app developer, you are considering using an existing dataset current_customers_analytics.xlsx for your application. You investigate the origins of this dataset so that you can understand where the data comes from. From the dataset tile or the row, select Lineage from the More menu to open the lineage graph. From the lineage graph, you view metadata for the dataset by selecting the icon more actions menu on the XLSX current_customers_analytics.xlsx and Open the overview.

Dataset overview can be opened from the lineage graph — Open dataset overview from the lineage graph

Dataset overviews can be accessed from the lineage graph — View tags, classifications, and other technical metadata from the dataset overview tab

Data profile is available from the Profile tab

Click the browser back arrow to return to the lineage graph to explore the lineage graph for the dataset. Expand Arrow down the current_customers_analytics.xlsx node, and click Select all, to view available fields. Do the same for all nodes. Note that each field provides the option to make it the base node of focus by selecting Lineage (new base node) or select Impact analysis to view forward lineage and dependent objects that will be impacted by changes to the dataset.

Viewing lineage for a dataset — Expanded Lineage graph for the dataset. Each field within each node will have options to open the app or data, view impact analysis, or change the node in focus

Following the lineage backwards and expanding the nodes, you can see that this XLSX dataset is the output of the Prep Current Customers Sales - Analytics app. Going back another relay and expanding the File storage node, you see that the sales analysis app had a CSV file loaded to it: rgb_customers.csv. Field-level analysis reveals that the Tags field in the original source file was re-named to rgb_customers.Tags in the sales analysis app. The original CSV file can be opened to the overview to reveal valuable metadata such as the owner, creator, usage metrics, tags, classifications, field profile, and impact analysis.

Example: Investigating lineage of machine learning content

A casual business user or machine learning expert could use the Lineage graph to inspect the origins of certain predicted values. With the base node set to the prediction dataset, this user can see:

The training data, including its sources and transformations
The experiment, experiment version, and model
Where the model was deployed, and how it has been used

Lineage graph showing an end-to-end flow from training data preparation to a prediction dataset — Lineage graph with all nodes expanded. The graph shows an end-to-end flow from training data preparation to a prediction dataset.

The image above shows the following process:

A data flow loads and transforms data from a CSV dataset stored in a personal space . The output is stored into a Parquet dataset in the same space.
The Parquet dataset is used in version 1 of an ML experiment . This experiment version trains an ML model .
The ML model is deployed into an ML deployment .
Using a CSV dataset in a personal space as the apply dataset, the ML deployment generates a prediction dataset in Parquet format .

Lineage in Data Integration

The Lineage graph is also available in Data Integration. For more information, see Analyzing lineage in Data Integration.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!

Leave your feedback here