Configuring failover for central node resiliency
In a multi-node site, you can assign a node to be a failover candidate. A failover candidate can perform the same role as the central node in the case where the central node fails. A multi-node site with a designated failover candidate can help you achieve a more resilient and highly-available deployment.
Failover considerations
Before you create a failover candidate node, it is important to consider your deployment architecture. A failover candidate node can help you maintain a resilient and highly available deployment by minimizing the downtime of your site if the central node fails. However, the failover candidate node provides failover capacity only for the Qlik Sense services running on the central node. If you want to create a highly available deployment, you must add resiliency to the storage layer as well.
Storage layer resiliency
If the storage components reside on the central node when it fails, they become unavailable because the failover candidate node does not provide failover for the storage components. You can add resiliency to your repository database and the file share by deploying them on a separate node from your central node. Other options to add resiliency are:
-
Deploy a standalone database on a virtual machine and take advantage of the resiliency options provided by the virtualization platform.
-
Host the file share in a network file location or a storage area network (SAN), or use resilient storage provided by a cloud platform.
For information about database replication and failover, see Database replication and failover.
Create a failover candidate node
When you create your multi-node site, you first create the central node and then you join additional nodes to the cluster. From the QMC, you can set one of these non-central nodes to be the failover candidate. The failover candidate will take over the responsibility of the central node if it fails. To set a failover candidate node, see Create a node.
Once you add more nodes to your site, you can assign one or more of them to be a failover candidate. For a node to be a failover candidate, it must run the following services:
- Qlik Sense Repository Service
- Qlik Sense Engine Service
- Qlik Sense Proxy Service
- Qlik Sense Scheduler Service
Automatic failover
In a multi-node site, each node regularly checks the central node for a heartbeat. If after 10 minutes (the default timeout period is 10 minutes) there is no response from the central node, the site will automatically fail over to the failover candidate node. If there is more than one failover candidate node, the first node to get a lock on the database field becomes the central node. If the node that was previously the central node comes back online, it becomes a failover candidate node.
You can view the status of the nodes that make up your multi-node site in the QMC. The default view does not include the node type, but you can customize the node information that is displayed. To see the node information, and to configure the information displayed about each node, see Nodes.
The default timeout period for the central node is 10 minutes, but you can change it in the QMC. To change the default timeout, see Cluster settings.
Failover candidate node with inbound and outbound ports
As mentioned above, the failover candidate can have different roles, depending on your organizational needs. The example below shows a multi-node site with a single failover candidate node that is running as the worker scheduler in this site. The failover candidate must have the same inbound and outbound ports open as the central node. As the failover node acts as the worker scheduler, therefore, port 5151 and 5050 must be open inbound and outbound on their respective nodes for scheduling jobs to the failover candidate node.
For a complete list of inbound and outbound ports for all services, and to see more deployment examples, see Ports .
Manually migrating the central node
You cannot use the QMC to change which node in your site is the central node. You can, however, use the QRS REST API to do this. Before manually reassigning a failover candidate node to a central node role, you must ensure that it is running the necessary services for the central node.
Use the following REST API calls:
- Run a GET to /qrs/serverNodeConfiguration to return a list of server GUIDs.
-
Run an empty POST to /qrs/failover/tonode/{serverNodeConfigurationID} where {serverNodeConfigurationID} is the ID of the node you want to become the central node.