Clustering QlikView Publisher
This chapter provides an overview of QlikView Publisher and how to use it in a clustered deployment for scalability, resilience, or both. This chapter also addresses the architectural and installation requirements and the options for building a clustered and resilient QlikView Publisher deployment.
Introduction
QlikView Publisher is an optional module for QlikView Server that enables scheduling, administration, and management tools that provide a single point of control for QlikView analytics applications and reports. Administrators can schedule, distribute, and manage security and access for QlikView applications and reports across the enterprise.
QlikView Publisher performs the following main functions:
- It loads data directly from data sources defined in connection strings in the source QlikView document files.
- It is used as a distribution service to “reduce” data and applications from source QlikView document files based on various rules (for example, user authorization or data access) and distribute these newly-created documents to the appropriate QlikView Servers or as static reports via email.
- When using QlikView Publisher, only Publisher has access to the source documents folder and the data sources for data load and distribution. The source documents and data are not accessible by QlikView users.
By deploying a clustered architecture, QlikView Publisher achieves scalability and/or resilience using web services technology. Administrators can cluster services together to provide load balancing. Native support for SNMP enables integration with enterprise system monitoring tools. External enterprise scheduling tools can trigger Publisher tasks using web service calls. Tasks can also be scheduled and executed on demand by QlikView administrators.
The figure below shows a two-server, clustered QlikView Publisher where each server is configured for processing different tasks and load balancing. The figure also includes a three-server, clustered QlikView Server that uses QlikView AccessPoint for load balancing. Documents created by QlikView Developer are stored in the source documents folder. QlikView Publisher tasks are used to retrieve data and store the result in the user documents folder.
To see how to set up an unbalanced distribution service cluster, see Unbalanced QlikView Publisher Clustering
Source Documents
The source documents contain a) scripts within QlikView document files to extract data from various data sources (for example, data warehouses, Microsoft Excel files, SAP, and Salesforce.com), b) the actual binary data extracts themselves within .qvd files, or c) a binary load from another QlikView document file, inheriting its data model in one line of code.
The QlikView source documents, created using QlikView Developer, reside in the following folder:
- Windows Server 2008 and later: \ProgramData\QlikTech\SourceDocuments. This is the default QlikView location for Windows Server 2008 and later.
User Documents
The user documents folder is the repository used by QlikView Server. The folder is located at:
- Windows Server 2008 and later: \ProgramData\QlikTech\Documents. This is the default QlikView location for Windows Server 2008 and later.
Tasks
Tasks are created by administrators for data distribution and data reloads. Tasks are stored in the QlikView Publisher repository as a collection of XML files or in an SQL Server database. When a task is executed, QlikView Publisher invokes QlikView Batch (QVB), which is comparable to QlikView Desktop without the user interface.
QVB reloads the documents, which are stored in the source documents folder(s) and creates an associative QlikView database, which is stored within each document. The QVB performs the reload by retrieving the data described by the load script from the data sources. QlikView Publisher distributes the documents to the user documents folder for QlikView Server using the encrypted QVP protocol, to a cloud environment, a mail server, and/or a file folder. QlikView Publisher can use the Directory Service Connector (DSC) to determine where and to whom the documents are to be distributed.
Why Cluster QlikView Publisher?
The role of Publisher in the QlikView solution is to distribute and refresh data by criteria set by the QlikView administrator. To accomplish this, Publisher executes many tasks, either scheduled or on demand. A Publisher task is the smallest entity that can be distributed in a cluster; a single task cannot be divided and executed in parallel on multiple cluster nodes. Clustering the Publisher service on more than one server enables the administrator to distribute multiple tasks to multiple servers operating in parallel using the Publisher load balancing algorithm. This means Publisher clusters can be used to increase the scalability, availability, and serviceability of data distribution and reloading.
In addition, a Publisher cluster license enables the configuration of Publisher services in clusters and standalone Publisher services. For example, a Publisher cluster can be used in a corporate office to handle large volumes of data and tasks, whereas a single Publisher service can be used in an associated manufacturing plant where the Publisher only needs to distribute documents using the manufacturing data source.
By clustering QlikView Publisher, the following objectives can be met:
- Horizontal scalability
- Resilience
Horizontal Scalability
Horizontal scaling of hardware provides the ability to increase the resources of the QlikView deployment. By adding additional hardware servers, the workload of QlikView Publisher can be increased. The clustered Publisher servers can then be configured to load balance the QlikView tasks.
For example, on a certain hardware server, QlikView Publisher can process eight concurrent tasks. When the resource needs increase, the QlikView Publisher service can grow as needed. By adding an additional QlikView Publisher service on a new hardware server, the deployment can handle up to sixteen concurrent tasks by configuring the additional server in a Publisher cluster deployment. In this scenario, the first eight tasks are allocated to Server A and the second eight tasks to Server B. Alternatively, if the servers are clustered, the tasks can be load balanced over the two servers.
Resilience
When the number of tasks in the deployment increases, the window for completing the tasks in time becomes increasingly important. Clustering the QlikView distribution services provides for resilience in the deployment. In the case above, where a single server can support 100 concurrent tasks, an additional server can be deployed (for a total of three servers) in order to build resilience into the deployment. If a server is lost (for example, due to a hardware failure or network connection issues), the resilient cluster still supports up to 200 tasks. Having all three servers as active nodes helps reduce response times by not running all servers at 100% of their capacity. It also limits the number of tasks and task chains affected if a node is lost.
Requirements for a Clustered QlikView Publisher Deployment
The following high-level requirements must be fulfilled for a clustered QlikView Publisher deployment:
- Clustered QlikView Publisher license key
- Shared network storage
- Load balancing strategies
Clustered QlikView Publisher License Key
In a clustered environment, the QlikView Publisher servers are installed with the same license key. This can be verified by examining the following entry in the License Enabler File (LEF):
PRODUCTLEVEL;30;; (where 30 is the code for QlikView Publisher)
NUMBER_OF_XS;N;; (where N is the number of allowed QlikView Distribution Services)
The servers in a clustered QlikView Publisher deployment share configuration and license information among themselves via the shared storage, so configuration and license management only needs to be performed once in the QMC for all nodes.
Shared Network Storage
In QlikView shared network storage can be used for storing source (.qvf or .qvw) and cluster files (notification, tasks, triggers, logs etc) that need to be accessed in QlikView Publisher cluster.
The requirements for a shared network storage in a QlikView Publisher cluster are the following:
- The network storage must be hosted on a Windows-based file share.
- QlikView Publisher supports the use of a SAN (NetApp, EMC, etc.) mounted to a Windows Server 2008 R2 (or later) and then shared from that server. Storage presented to a server via a SAN must appear as locally attached storage. If SAN storage is used for Publisher, any distributed data that is accessed by QlikView Server should not reside on the SAN storage.
- The QlikView Publisher nodes in the cluster must have network latency below 4 milliseconds to connect to the file share server. Performance can degrade if this is not the case.
- A maximum of two nodes in a QlikView Publisher cluster can share the same shared storage. If more than two QlikView Publisher nodes are required, it is recommended to deploy the additional publisher nodes in an additional cluster. The QlikView Management Console can manage multiple publisher clusters.
- The bandwidth to the file share must be appropriate for the amount of traffic on the site. The frequency and size of the documents being saved after reloading, and opened into memory, drives this requirement. 1 Gigabit networking is suggested.
- The following shared storage options are not supported:
- Shared storage systems based on Linux OS are not supported. This includes systems supporting SMB file sharing protocol or NTFS disk drive format .
- Windows-based shared storage systems that rely on CIFS file sharing protocol are not supported.
- QlikView does not support Windows Distributed File System (DFS).
Load Balancing Strategies
Load Balancing
The load balancing is determined by an internal ranking system based on the amount of memory available and the CPU use. Qlik recommends using the default settings, since they have been extensively tested.
To change the default settings, edit the configuation file, QlikViewDistributionService.exe.config. The key is written in JavaScript:
<add key="LoadBalancingFormule" value="(AverageCPULoad*400) + ((MemoryUsage / TotalMemory) * 300) + ((NumberOfQlikViewEngines / MaxQlikViewEngines)*200) + (NumberOfRunningTasks*100)"/>
where:
- AverageCPULoad: Average CPU load for all running QVBs.
- MemoryUsage: Total memory use for the entire application.
- TotalMemory: Total amount of memory on the server.
- NumberOfQlikViewEngines: Number of QlikView engines currently used.
- MaxQlikViewEngines: Configured value for the maximum number of QlikView engines.
- NumberOfRunningTasks: Number of tasks currently running.
Simultaneous Tasks
By default, four QlikView tasks can execute simultaneously on a node. The recommended maximum is eight simultaneous tasks per node. If more than ten tasks have to be executed simultaneously on a node, modifications are necessary in the Windows registry to change the desktop heap size to allow for more simultaneous tasks.
Proceed as follows to change the number of tasks allowed to execute simultaneously:
- Backup the Windows Server registry.
- Locate the following Windows Server registry setting:
- Change the desktop heap size by setting SharedSection to 1024,20480,2048:
- Save the registry changes and restart the machine.
- Change the Max number of simultaneous QlikView engines for distribution setting in QMC to the number of engines needed.
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session\Manager\SubSystems\Windows
%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows
SharedSection=1024,3072,512 Windows=On SubSystemType=Windows
ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3
ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off
MaxRequestThreads=16
The default value for SharedSection is 1024,20480,768 for 64-bit (x64).
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session\Manager\SubSystems\Windows
%SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows
SharedSection=1024,20480,2048 Windows=On SubSystemType=Windows
ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3
ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off
MaxRequestThreads=16
Security
QlikView Publisher provides access to QlikView applications and data. It is therefore important to integrate QlikView Publisher with the enterprise security solutions in addition to the standard security features of QlikView Server.
QlikView Publisher is viewed as a backend process within the QlikView solution. From a security perspective, it is important to understand that the frontend does not have any open ports to the backend. The frontend does not send any queries to data sources on the backend, nor do any of the user documents (.qvf or .qvw files) contain any connection strings to data sources located on the backend. End users can only access QlikView documents that exist on the frontend. Within the backend, the Windows file system is always in charge of authorization; QlikView is not responsible for access privileges.
The figure below shows a simplified view of a standard QlikView deployment containing the location of the QlikView products and the data and applications.
Directory Services
To provide security for QlikView documents, QlikView Publisher can connect to an external directory service (for example, Active Directory, LDAP, a database, or other sign-on solutions). The external directory service is an authentication source with which QlikView has a trust relationship.
QlikView provides a built-in Directory Service Provider (DSP) for Active Directory that allows QlikView administrators to assign Active Directory user privileges to QlikView documents or portions thereof. QlikView Publisher leverages this built-in provider to provide direct integration with, and support for, Active Directory.
QlikView also provides a means of creating a Configurable LDAP for other directory services. A Configurable LDAP enables QlikView administrators to grant privileges to users authenticated by any authentication system other than Active Directory.
QlikView Server Authorization Modes
QlikView Server provides two mutually exclusive options for authorizing access to QlikView documents. Depending on the authorization mode of QlikView Server (NTFS or DMS), Publisher populates the appropriate Access Control List (ACL) when assigning rights to a document. In case of NTFS authorization, Publisher populates a standard NTFS ACL when sending documents to QlikView Server. In case of DMS authorization, Publisher populates an ACL contained within a .meta file associated with the application.
Static Data Reduction
Data reduction is a security mechanism that allows application data to be purged from a QlikView application in accordance with row-level security settings. QlikView Publisher can automate data reduction independently of the applicable security scenario. However, Publisher allows an administrator to configure data reduction based on users or groups defined within any external authentication source available through a custom or Active Directory DSP. Publisher performs the data reduction using the “loop and reduce” functionality in QlikView. The Publisher data reduction should not be confused with the dynamic data reduction associated with Section Access.
Configuring QlikView Publisher Clustering
Requirements
The following requirements must be fulfilled before starting the QDS cluster configuration:
- A QlikView Publisher license that supports more than one QDS. The Publisher LEF must contain the entry NUMBER_OF_XS;N;;, where N is 2 or higher.
- QlikView AccessPoint (based on QlikView Web Server or Microsoft IIS), QlikView Management Service (QMS), QlikView Server (QVS), and DSC are already installed in the QlikView system in the network.
- A domain user to run the QlikView services on every machine is available.
- A shared storage device; Qlik recommends a shared device mounted as a Windows-based file share.
- QlikView Publisher status, configuration, and log files
- QlikView source documents
All QDS cluster nodes need read and write access to the following, centrally stored data:
Step-by-step Instructions
Prepare the Shared Storage Device
Create folders for the files accessed by every Publisher cluster node:
- \\<server1>\ProgramData\QlikTech\DistributionService (application folder)
- \\<server1>\ProgramData\QlikTech\SourceDocuments (source documents folder)
Prepare the Cluster Nodes
Proceed as follows on each planned QDS cluster node:
- Login as administrator.
- Configure the firewall to secure the QlikView solution. The QlikView services require the ports listed in the table below to be “opened”.
- Deactivate the Internet Explorer Enhanced Security Configuration for administrators. By default, Windows Server 2008 and later ship with this configuration enabled, which is basically a locked down version that adds a bit of extra security to the servers for web browsing. When the configuration is enabled, it may cause problems in viewing the QMC and service content. The Internet Explorer Enhanced Security Configuration can be left turned on, but if any issues arise, turn off the feature for the Administrators group.
- Add the domain user that is used to run the QlikView services to the Local Administrators Group.
- Start the QlikView 64-bit (x64) server setup and select Custom installation, select profiles. Then select the Reload/Distribution Engine feature and install it on each node where Publisher is to reside.
- Enter the QlikView service account credentials.
- Finish the setup and restart the system immediately.
Service | Port |
---|---|
QDS (Publisher) (required for Publisher) | 4720/TCP |
DSC (required for Publisher) | 4730/TCP |
QMS (required for Publisher) | 4780/TCP |
QlikView Web Server/Microsoft IIS configuration | 4750/TCP |
QVS configuration | 4749/TCP |
QVP communication | 4747/TCP |
QMS (EDX calls) (required for Publisher) | 4799/TCP |
Configuring QDS Cluster in the QMC
Proceed as follows to configure a QDS cluster in the QMC:
- Open QMC and register the QlikView Publisher license with the activated cluster nodes.
- On the System>Setup tab, add the first QDS cluster node under Distribution Services.
- Switch the Application Data Folder and the Source Folders to the shared device folder paths using UNC syntax.
- Click Apply and restart the QDS manually.
- Add each additional QDS cluster node in URL format.
- Click Apply and restart the QDS on all nodes manually.