Configuring and troubleshooting the NodeGraph service
Below are configurations that may be of interest regarding the NodeGraph service.
NodeGraph Service Account
When NodeGraph is installed, it is done with a default service account of Local System. However, in a real-world situation, this will not hold as the service account is very much relevant in reading information and contacting different toolsets.
On the NodeGraph machine, you will find the service aptly named NodeGraph 4 Service (version 3 and earlier was named NodeGraph Service). Depending on the connector, this account is used to various extent in the gathering of information. But it is the account used in any file-based operations.
To use another account, simply stop the service, change to an account with the proper permissions, and then restart the service.
NodeGraph Data Storage
There are two folders relevant for NodeGraph’s operation. NodeGraph separates where it stores the program files (the installation folder) and where it stores the configuration and data (the ProgramData folder).
While the installation folder is defined during the setup phase, the ProgramData folder can be changed in configuration files.
In the installation folder of NodeGraph, you will find two important files:
This file has parameters to control certain behaviors of NodeGraph.
This file controls the logging behavior of NodeGraph including verbosity and placement.
If you change the appSettings > DataDirectory parameter of NodeGraph, you can change where NodeGraph will store its information. It is commented by default.
For any changes to take effect, a restart of the NodeGraph 4 Service is required.
There are two different types of logs within NodeGraph. In the Data directory, there is a folder named "logs", with a subfolder named "jobs".
Within the logs folder, you will find the service logs. These logs provide information about the general service, requests, usage, and potential issues with the service itself.
Within the subfolder of jobs, you will find any initiated task log. This includes connector behaviors, data catalog syncing, and other types of jobs that can be triggered, such as DQM tests.
If you experience any unwanted behavior, the connector logs will be a good place to start seeking seek further insights as to why a certain connector does not match a certain asset that is expected.
Estimating NodeGraph’s impact on performance is tricky. There is no perfect way of estimating the needs for the machine performance beforehand. CPU is relevant for the lineage parsing of the tools and logs, while memory is relevant to keeping the graph data loaded.
NodeGraph has all containers loaded into memory, so the more containers you have, the more memory it will consume. During a reload, there is a mirror version of a graph where it has its current state and its new connector state simultaneously.
The graph data is stored in the ProgramData folder in graph files. The file extension on these files is either gdbl or gdbs and is decided by the PersistenceEngine type decided in the configuration file, LiteDB (gdbl) or SQLite (gdbs). What type to use is case-specific. The default value is LiteDB but some large-scale environments may see an increase in sync speeds when switching to SQLite.
Most graph traversing queries are not CPU heavy, and the design principle of NodeGraph is that each user connects from a browser towards NodeGraph. This way the presented information is offloaded to the client machines. The data throughput on the network is generally rather low. As more and more information is added into a graph, it is recommended to use “Use rule name as top level” and file groupings, all to avoid too many visible nodes at the same time.
The more nodes that need to be rendered, the more nodes need to have their relationships calculated. This in turn will lead to a longer calculation of the aggregated relationships and more lines to draw on each presentation, which will be perceived as a heavier and “sluggish” interface. The more the information can be grouped, the better it will be both performance-wise and visually.
Installing / Upgrading / Moving
During an install, it is crucial to remember to update the service account, as it will inevitably create obstacles and confusion down the line if left unchanged.
When upgrading NodeGraph, first make sure to note down the service account. It is always a good idea to back up the configuration files from the installation folder before upgrading as well.
Always stop the service and uninstall NodeGraph prior to an upgrade, do not just run the MSI package. Failure to do this can lead to unexpected behavior in the software. The ProgramData is never touched during an uninstall, which means that the configuration should come right back after the upgrade installation.
If you need to move a configuration of NodeGraph, the two most important files in the ProgramData folder are the nodegraph.dat file which contains the configuration and Data Catalog data, and the license file, named license.dat.