Data Lineage

The Data Lineage sheet enables you to analyze QVWs in terms of their data sources, how they read data from the sources, and what output files (QVDs and QVXs) they write to. Before any selections are made, the Lineage Detail section on the sheet displays complete lists of sources, QVW processes, generated QVDs and QVWs, and load statements. It also provides a high-level data source analysis at the top of the page, above the Sources table--the total databases, spreadsheets, QVDs and the total number of data sources.

Lineage detail

The Sources table lists individual data sources accessed by Processes. The Governance Dashboard determines a data source by querying the metadata information embedded in the QVWs, QVDs, and QVXs in the scan folders. Some of the typical source types are:

  • Inline - data typed directly into a load script (no external table or data source).

  • Resident - data "loaded" from a table that already has been loaded in QlikView memory.

  • AutoGenerate - data generated by the load script (no external table or data source). Different from Inline in that QlikView generates a specific number of rows of data based on user input. For example:

Calendar: Load dateadd(today()+recno()) as running_date AUTOGENERATE(365)

  • Binary - data model loaded directly from a separate QlikView application (QVW) using the "binary" command.

  • Database - data from a standard database, described by a database table with "Where" statement where applicable.

The Source to Target button in the Sources table is enabled when a single source QVD or QVX is selected. The button is used to search for and select the Process with the identical string as the Source. This can be used to identify the upstream source for the source QVD or QVX selected. For example, if the Source in step 2 in a data lineage flow is known, selecting the Source to Target button reveals the Source for step 1.

The Processes table describes the vehicle that extracts, transforms, and/or loads data from the Source(s) and may or may not generate target QVDs or QVXs. Processes can be QVWs.

QVWs that have mixed-case names appear twice, once in all lower case and once with the mixed case.

The Target in the Generated Files table is the path of the QVD or QVX output file generated by the associated QVW in the Processes list.

Note: The Governance Dashboard determines this path as the actual path of the file. This differs from the logic used to determine the "Source," which is actually embedded in the file's XML header.

The Target to Source button in the Generated Files table is enabled when a single QVX or QVD is selected as the Target in this section. The button is used to search for and select the source with the identical string as the selected target. This can be used to discover the downstream process and target(s) or the generated QVDs/QVXs from a target of interest. The Target to Source button performs the opposite actions as the Source to Target button.

The Load/Select Connections, Sources, and Statements table shows the connection source, how it is connected and what fields are being loaded.

  • Connection String is the database connection string.
  • Source is the database from which data is being loaded.
  • Select/Load are the fields being pulled from the Source.

The Subprocess to Process button in the Load/Select Connections, Sources, and Statements is enabled when a single subprocess is selected. The button is used to search for and select the process with the identical string as the subprocess. This can be used to discover the upstream source for a particular QVW that uses the connection string.

Note: The QVW that uses the connection string must be running QlikView Desktop or QlikView Server 11.2 or later or QlikView 12

QVD files

The QVD Files section of the Data Lineage sheet presents change data for tables in QVDs.

In the QVD Change Details table, data is organized by Table Name Objects in the scanned QVDs.

Multiple instances of a table name indicate the table is used in multiple QVDs.

FileName lists the source of the QVD containing the table listed.

In the QVD Reload History table, data is organized by table name objects in the scanned QVDs.

The load dates are indicated by green dots in a date columns. The load date is for each file containing the named table.

Absence of a green dot on a specific date indicates the file did not load on that date. Absence of a green dot can indicate failure to load or that the file was not scheduled for loading on that date.

The QVD Size Change Over Time chart illustrates the number of rows in a QVD over time.

A QVD file must be selected to see size change over time.

QVD fields

The QVD Fields section of the Data Lineage sheet presents records and fields data for tables contained in the scanned QVDs.

The QVD Table Record Counts bar chart shows the number of records in each table.

For individual tables, select a table from the QVD Table List box, a bar in the chart, or from the QVD Table Header table. The QVD Table Header table displays the number of records and number of columns/fields in the selected table.

The Field Frequency in QVDs table lists the field names in the selected table. Numbers in the Tables and Files columns indicate how many times the field name is used.

The QVD Fields table lists field names in tables.

  • Max # Records and Max # Symbols is from the last time the QVD file was scanned.
  • Max # Symbols is the number of distinct or unique representations of field data. For example, a field with 2 symbols probably takes a boolean value, such as Yes and No. When the Max # Symbols is equal to the Max # Records, each record has a unique value.
  • Selectivity is the Max # Symbols divided by Max # Records. It provides the percentage of records that have a uniquely selectable value.

Did this information help you?

Thanks for letting us know. Is there anything you'd like to tell us about this topic?

Can you tell us why it did not help you and how we can improve it?