A simple configuration
This topic aims to educate about the core concepts inside NodeGraph and how to perform a basic set up. The configuration will be based on a QlikView environment. The reason for this is that QlikView applications are file-based and therefore allow for easy distribution. No QlikView software or license is required to follow the guide, only NodeGraph software and a license. No knowledge of how Qlik works, or their script language is needed either.
The QlikView structure can be found by following this link: AppendixA_DEMO_Structure.zip.
The file structure has a few folders and files within. Without going into depth, the structure shows a basic ETL flow. Files you will find inside are:
|QVW||The application / reports. Has data models and presentation objects.|
Log files from the last application execution. This is what NodeGraph parses for the QVW connector.
|QVD||Binary storage files of data blobs.|
|XLSX||Support data for the purpose of showing additional external assets in a configuration.|
The general idea of the information within is as follows:
Understanding the general data flow is not necessary when setting up a NodeGraph container, but it does help.
With software and license in place, start with an empty configuration. When you enter NodeGraph, and after inputting the license, you will see an empty work area, without any content. To begin adding content, go to the settings page.
This is also where you can input the license if that page does not load up automatically.
Add a new container by pressing the plus button in the header. Call it, for example, TestContainer, and skip the other settings for now.
This tutorial will add the categories as it progresses through the rulesets. However, note that they are accessible in the adjacent menu from the container here. So, whenever those need to be edited, this is where it can be done.
Adding your first information
Once a container has been added, connectors can be added to it, and via them, information can be added into NodeGraph.
As this demo structure is built with QlikView, that will be the first demo connector.
Each connector has its own configuration. This is usually how to connect to the tool, and some general settings regarding how it should behave. The first four entries are shared between all connectors, the rest is connector specific.
The System ID and System Name are how they should be identified throughout NodeGraph. The ID is unique, and while there are no hard rules about how to name it, it is advisable to consider what it will identify. The name of the machine, tool, or even the purpose is a good identifier. If you have more than one environment to capture in each container, you might want to add the environment to the ID as well, i.e., powerbi-prod and powerbi-test. For an SQL server, the machine name will probably suffice, as the distinction between databases are handled on the nodes later.
For now, call the system ID qlikview, and the system Name QlikView.
All connectors have a test button to verify the access to the toolset. For file-based toolsets, this is not applicable, since the access is defined based on what resources are being targeting (through the rules). For others, like QlikSense or Tableau, it is very helpful to verify the access before trying to go further.
Once you save the connector, you will be directed to the connector list, this time populated with the newly created connector.
The connector right now does not add anything to NodeGraph. It is simply a tool reference. To add information, click the Edit button to add rules.
Connector and rules
This is the initial page in a QlikView connector. Each connector can have their own setups in here as well, but the top corner buttons with Delete and Change Settings are shared throughout. It lets you delete the tool reference or change connection settings to the toolset.
Start adding information to NodeGraph by adding a rule.
This tutorial will make three different rules, and will start with the end application.
The name of the first rule is Load App. The Path points to where to find the 30.LOAD directory and the file contained named Sales.qvw.
As no categories were added beforehand, add them in the rule. As mentioned before, their purpose is to keep information in the overview. For a QlikView rule, there are three categories presented, and an additional one if you also choose to add content. What each connector detects, and what it will divide the information between, is connector specific.
Application category here is the underlying app data model. Click the “+ Create new category” and look at the new form in detail below.
The category interface is the same regardless of what information it will hold. The Name is simply descriptive. Layer and Sort order refers to their respective placement on the Dependency Explorer overview later. They are not absolute positioning numbers, but rather sort weights that are applied in Layer, Sort, and alphabetical Name order. If you start with the end user / consumer application, you will most likely have more information to add to the beginning of the overview, so give yourself some space and put it as 30. It is a good approach to not use single digits initially so you have the option of putting other elements in between which you may not always know of beforehand.
The unmapped Categories are explained in Understanding Unmapped Categories, but here is a brief explanation.
When any connector's information is parsed, you may also see references to assets external to the tool itself, but relevant to the data lineage. An example of this is OLEDB connections, web assets, or plain files.
These files and data sources will be added with their own connectors that fits their purpose, but you may still want to present what you find until you have had the chance to do so. So, until their existence is added with a connector, their reference from the tool parsed (in this case QlikView) is stored under each of the categories we call Unmapped Data Source and Unmapped File Source.
You can see in the screenshot the categories added, and their respective Layer / sort order in the preview table of the Content.
You can already see the categories presented in the overview if you head over to the Dependency Explorer.
The categories are visible here:
You can now parse the application rule you added, and see information being fed into the graph as well. To do so, go back to the Settings page, and then to Scheduler.
There are now two jobs on the list, but you only created one, the QlikView connector. The other job you see is to sync the graph information into Data Catalog, the landing page of NodeGraph, so that information becomes searchable there as well. But before that is possible, the QlikView connector must be reloaded.
After it has succeeded, the status will change to Done. This info can now be added to the Data Catalog by reloading the next connector. Once completed, StatusAction will indicate Done status and a message will indicate that the "Reload job completed successfully".
Go to the Dependency Explorer first to see the information parsed.
You can now see the application you parsed, the content within that application, and some external files, in this case QVDs (Qlik’s binary file format) in the overview.
The next step is to extend the lineage of this demo container and see what has created the binary files under the Unmapped File Sources.
To do this, you will need to create extra rules for QlikView and a separate connector for the unmapped files.
Additional QlikView rules
Go back into the connector details of the QlikView connector and add two additional rules.
One rule will target the Qlik applications under 10.Extract and one for the applications under 20.Transform.
The Extract and Transform applications
If you check in the 10.Extract folder, there is more than one application.
Add a Name to the rule, so you know what it is targeting.
Set the Path to the folder where you can find these files and add a *.qvw to get all three applications.
Create a new category for the extract applications and set the properties of that category earlier in the category setup.
You can reuse the categories for Unmapped as there is no need to separate these. A situation when that would be relevant could be if you have both prod and dev assets in the same container. You might want to differentiate between the unmapped resources in a Prod context and a Dev context.
Do not include content, as applications made with an Extract or Transform purpose rarely have any remaining data and therefore nothing to analyze.
Create a similar rule for Transform with its own Application Category of Transform applications on Layer 20 and Sort order 0.
The Transform applications can be found under 20.Transform instead of 10.Extract.
You should now have three directory rules targeting this structure:
Load App C:\Users\maja\Desktop\DEMO\30.LOAD
Extract App C:\Users\maja\Desktop\DEMO\10.EXTRACT
Transform Apps C:\Users\maja\Desktop\DEMO\20.Transform
Go to Scheduler and run the QlikView Connector again.
You can now see a Database connection called development, a web resource reference to XE.com, various applications, and a couple of files structured under Unmapped Files.
In the next section, you will add a File Connector to sort out the files referenced under Unmapped Files.
Go back to the Connector page and add a File Connector.
Enter the required fields to add a File Connector and Save. Then open the File rules by pressing Edit on the new connector.
After the File Connector is created, select Edit.
Note that the screen that appears differs from the QlikView Connector screen, specifically the Groupings tab.
Start by adding the first rule that will capture all binary data files or QVDs that is part of the 10.Extract structure. The name, as before, is descriptive of its purpose. Check the “Use rule name as top level” option. This groups all information seen under this rule.
Create the target Category on the same layer as the Extract Apps (20), but set the Sort to 10. This puts it below the applications category box and helps build an easy-to-understand overview.
Lastly, target the top directory and simply scan the subfolders (and choose to remember the folder structure) in the rule.
Add a similar rule for the Transform QVDs and create a category for that as well.
There are now two rules for the binary data files. However, there were also some Excel files in this structure.
Create a Rule for them as well. In this case, you know what information is of interest, but in a real scenario, you would find these files referenced in the Unmapped Files category inside the Dependency Explorer and you would know what to add from there.
Set Excel Files as the rule name and create the rule name as top node in this case also.
In this case, you may be a bit unsure where to add it to the overview, so place it below the Transform QVDs on 20/20.
Scan the subfolders for both the file patterns of .XLS and .XLSX, separated by a semi-colon ; .
On the Scheduler page, you can now find the File Connector. You can now execute it to parse this information as well.
After it has been executed, go back to Dependency Explorer to see exactly what has been added.
The newly made structures are visible, but the old files are as well. This is because while you have added the new structures from the File Connector, this information is not yet sent over to the Qlik Connector.
It is generally a good idea to first run the File Connector and then the Qlik Connector as it builds its structures based on what else is within the graph. But here, you can run the Qlik connector once more and it can use the information from the File Connector to build a nice graph.
Go to Scheduler once more to run the QlikView Connector, and then return to the Dependency Explorer.
This lineage overview demonstrates a simple BI tool ETL flow from the query to the database, passing through two data extract and transformation steps before landing in a user-consumable report. The demonstration shows the basic components of NodeGraph using two connectors and file-based data to create a lineage overview.