Skip to main content

A simple configuration

This topic aims to educate about the core concepts inside NodeGraph and how to perform a basic set up. The configuration will be based on a QlikView environment. The reason for this is that QlikView applications are file-based and therefore allow for easy distribution. No QlikView software or license is required to follow the guide, only NodeGraph software and a license. No knowledge of how Qlik works, or their script language is needed either.

The QlikView structure can be found by following this link: AppendixA_DEMO_Structure.zip.

File structure

The file structure has a few folders and files within. Without going into depth, the structure shows a basic ETL flow. Files you will find inside are:

File structure of an ETL flow
FILE TYPE

PURPOSE

QVW The application / reports. Has data models and presentation objects.
QVS Script files.
LOG

Log files from the last application execution. This is what NodeGraph parses for the QVW connector.

QVD Binary storage files of data blobs.
XLSX Support data for the purpose of showing additional external assets in a configuration.

The general idea of the information within is as follows:

File structure
The structure of a NodeGraph container. It extracts information, transforms data, then loads the application.

Understanding the general data flow is not necessary when setting up a NodeGraph container, but it does help.

Configuring NodeGraph

With software and license in place, start with an empty configuration. When you enter NodeGraph, and after inputting the license, you will see an empty work area, without any content. To begin adding content, go to the settings page.

This is also where you can input the license if that page does not load up automatically.

Configure NodeGraph

The NodeGraph settings page.

Container

Add a new container by pressing the plus button in the header. Call it, for example, TestContainer, and skip the other settings for now.

Add a container
The NodeGraph containers tab.

This tutorial will add the categories as it progresses through the rulesets. However, note that they are accessible in the adjacent menu from the container here. So, whenever those need to be edited, this is where it can be done.

Adding your first information

Once a container has been added, connectors can be added to it, and via them, information can be added into NodeGraph.

Add a connector
The NodeGraph connectors page.

As this demo structure is built with QlikView, that will be the first demo connector.

Each connector has its own configuration. This is usually how to connect to the tool, and some general settings regarding how it should behave. The first four entries are shared between all connectors, the rest is connector specific.

Adding a QlikView connector
The Add QlikView connector popup window.

The System ID and System Name are how they should be identified throughout NodeGraph. The ID is unique, and while there are no hard rules about how to name it, it is advisable to consider what it will identify. The name of the machine, tool, or even the purpose is a good identifier. If you have more than one environment to capture in each container, you might want to add the environment to the ID as well, i.e., powerbi-prod and powerbi-test. For an SQL server, the machine name will probably suffice, as the distinction between databases are handled on the nodes later.

For now, call the system ID qlikview, and the system Name QlikView.

All connectors have a test button to verify the access to the toolset. For file-based toolsets, this is not applicable, since the access is defined based on what resources are being targeting (through the rules). For others, like QlikSense or Tableau, it is very helpful to verify the access before trying to go further.

Save the connector
The Test, Save, and Close buttons at the bottom of the Add Connector popup window.

Once you save the connector, you will be directed to the connector list, this time populated with the newly created connector.

Edit the connector
The Edit button on the right of the Connectors page.

The connector right now does not add anything to NodeGraph. It is simply a tool reference. To add information, click the Edit button to add rules.

Connector and rules

This is the initial page in a QlikView connector. Each connector can have their own setups in here as well, but the top corner buttons with Delete and Change Settings are shared throughout. It lets you delete the tool reference or change connection settings to the toolset.

Add a new rule
The QlikView connector settings. "New Rule" is selected.

Start adding information to NodeGraph by adding a rule.

This tutorial will make three different rules, and will start with the end application.

The new rule
Adding a new rule.

The name of the first rule is Load App. The Path points to where to find the 30.LOAD directory and the file contained named Sales.qvw.

As no categories were added beforehand, add them in the rule. As mentioned before, their purpose is to keep information in the overview. For a QlikView rule, there are three categories presented, and an additional one if you also choose to add content. What each connector detects, and what it will divide the information between, is connector specific.

Application category here is the underlying app data model. Click the “+ Create new category” and look at the new form in detail below.

Category interface

The category interface is the same regardless of what information it will hold. The Name is simply descriptive. Layer and Sort order refers to their respective placement on the Dependency Explorer overview later. They are not absolute positioning numbers, but rather sort weights that are applied in Layer, Sort, and alphabetical Name order. If you start with the end user / consumer application, you will most likely have more information to add to the beginning of the overview, so give yourself some space and put it as 30. It is a good approach to not use single digits initially so you have the option of putting other elements in between which you may not always know of beforehand.

Create a new category
The Application Category section, with fields for Name, Layer, and Sort Order. At the bottom, the Create and Select button is selected.

The unmapped Categories are explained in Understanding Unmapped Categories, but here is a brief explanation.

When any connector's information is parsed, you may also see references to assets external to the tool itself, but relevant to the data lineage. An example of this is OLEDB connections, web assets, or plain files.

These files and data sources will be added with their own connectors that fits their purpose, but you may still want to present what you find until you have had the chance to do so. So, until their existence is added with a connector, their reference from the tool parsed (in this case QlikView) is stored under each of the categories we call Unmapped Data Source and Unmapped File Source.

You can see in the screenshot the categories added, and their respective Layer / sort order in the preview table of the Content.

Setting layer and sort order
The preview layout. The categories Unmapped DataSource, Unmapped Files, and Load application are loaded above the Application Content.

To summarize,

Layers and Sort Orders
Name Layer Sort Order
Unmapped DataSources 0 0
Unmapped Files 0 10
Load application 30 0
Application Content 40 0

You can already see the categories presented in the overview if you head over to the Dependency Explorer.

Dependency Explorer

The Dependency Explorer can be found on the NodeGraph main menu.

The categories are visible here:

Visible categories in the Dependency Explorer

The Dependency Explorer. The categories Unmapped DataSource, Load application, Application Content, and Unmapped Files are visible.

You can now parse the application rule you added, and see information being fed into the graph as well. To do so, go back to the Settings page, and then to Scheduler.

There are now two jobs on the list, but you only created one, the QlikView connector. The other job you see is to sync the graph information into Data Catalog, the landing page of NodeGraph, so that information becomes searchable there as well. But before that is possible, the QlikView connector must be reloaded.

Scheduler with list of jobs

The NodeGraph Scheduler. Two jobs, Reload of Connector Qlikview and Sync Container in Data Catalog, are visible.

After it has succeeded, the status will change to Done. This info can now be added to the Data Catalog by reloading the next connector. Once completed, StatusAction will indicate Done status and a message will indicate that the "Reload job completed successfully".

Completed jobs

Both of the above jobs are marked as Done.

Go to the Dependency Explorer first to see the information parsed.

Updated lineage on Dependecy Explorer

The Dependency Explorer now shows lineage information connecting the categories.

You can now see the application you parsed, the content within that application, and some external files, in this case QVDs (Qlik’s binary file format) in the overview.

The next step is to extend the lineage of this demo container and see what has created the binary files under the Unmapped File Sources.

To do this, you will need to create extra rules for QlikView and a separate connector for the unmapped files.

Additional QlikView rules

Go back into the connector details of the QlikView connector and add two additional rules.

Adding additional rules

Adding a new rule to the QlikView connector by clicking on the New Rule link on the QlikView connector page.

One rule will target the Qlik applications under 10.Extract and one for the applications under 20.Transform.

The Extract and Transform applications

If you check in the 10.Extract folder, there is more than one application.

Creating a new category

Creation of the Extract applications category.

Add a Name to the rule, so you know what it is targeting.

Set the Path to the folder where you can find these files and add a *.qvw to get all three applications.

Create a new category for the extract applications and set the properties of that category earlier in the category setup.

You can reuse the categories for Unmapped as there is no need to separate these. A situation when that would be relevant could be if you have both prod and dev assets in the same container. You might want to differentiate between the unmapped resources in a Prod context and a Dev context.

Do not include content, as applications made with an Extract or Transform purpose rarely have any remaining data and therefore nothing to analyze.

Create a similar rule for Transform with its own Application Category of Transform applications on Layer 20 and Sort order 0.

The Transform applications can be found under 20.Transform instead of 10.Extract.

You should now have three directory rules targeting this structure:

  1. Load App C:\Users\maja\Desktop\DEMO\30.LOAD

  2. Extract App C:\Users\maja\Desktop\DEMO\10.EXTRACT

  3. Transform Apps C:\Users\maja\Desktop\DEMO\20.Transform

Go to Scheduler and run the QlikView Connector again.

Updated lineage in Dependency Explorer

The updated Dependency Explorer, showing the lineage data of the new categories.

You can now see a Database connection called development, a web resource reference to XE.com, various applications, and a couple of files structured under Unmapped Files.

In the next section, you will add a File Connector to sort out the files referenced under Unmapped Files.

File Connector

Go back to the Connector page and add a File Connector.

Add a new connector

Adding a new connector on the Connectors page.

Enter the required fields to add a File Connector and Save. Then open the File rules by pressing Edit on the new connector.

Add a file system connector

Adding a file system connector. The fields System, Systen Name, Description, Connector Tags, and Max Parallel Workers are available.

After the File Connector is created, select Edit.

Add a new rule

Editing the File Connector. The New Rule link at the bottom is selected.

Note that the screen that appears differs from the QlikView Connector screen, specifically the Groupings tab.

Start by adding the first rule that will capture all binary data files or QVDs that is part of the 10.Extract structure. The name, as before, is descriptive of its purpose. Check the “Use rule name as top level” option. This groups all information seen under this rule.

Adding QVD rule

Adding the Extract QVD rule. Layer and Sort are both set to 10.

Create the target Category on the same layer as the Extract Apps (20), but set the Sort to 10. This puts it below the applications category box and helps build an easy-to-understand overview.

Lastly, target the top directory and simply scan the subfolders (and choose to remember the folder structure) in the rule.

Add a similar rule for the Transform QVDs and create a category for that as well.

Creating a new category

Creating the Transform QVD category. Layer is set to 20, and Sort is set to 10.

There are now two rules for the binary data files. However, there were also some Excel files in this structure.

Create a Rule for them as well. In this case, you know what information is of interest, but in a real scenario, you would find these files referenced in the Unmapped Files category inside the Dependency Explorer and you would know what to add from there.

Adding a rule for Excel files

Adding an Excel Files category. Layer and Sort are both set to 20.

Set Excel Files as the rule name and create the rule name as top node in this case also.

In this case, you may be a bit unsure where to add it to the overview, so place it below the Transform QVDs on 20/20.

Scan the subfolders for both the file patterns of .XLS and .XLSX, separated by a semi-colon ; .

On the Scheduler page, you can now find the File Connector. You can now execute it to parse this information as well.

Updated Scheduler

Executing the reload of connector 'File' in the Scheduler.

After it has been executed, go back to Dependency Explorer to see exactly what has been added.

Updated Dependency Explorer

New and old files show legacy data in the Dependency Explorer.

The newly made structures are visible, but the old files are as well. This is because while you have added the new structures from the File Connector, this information is not yet sent over to the Qlik Connector.

It is generally a good idea to first run the File Connector and then the Qlik Connector as it builds its structures based on what else is within the graph. But here, you can run the Qlik connector once more and it can use the information from the File Connector to build a nice graph.

Go to Scheduler once more to run the QlikView Connector, and then return to the Dependency Explorer.

Complete end-to-end lineage

The updated Dependency Explorer now shows only up-to-date information.

This lineage overview demonstrates a simple BI tool ETL flow from the query to the database, passing through two data extract and transformation steps before landing in a user-consumable report. The demonstration shows the basic components of NodeGraph using two connectors and file-based data to create a lineage overview.