Skip to main content

Configuring failover for central node resiliency

ON THIS PAGE

Configuring failover for central node resiliency

In a multi-node site, you can assign a node to be a failover candidate. A failover candidate can perform the same role as the central node in the case where the central node fails. A multi-node site with a designated failover candidate can help you achieve a more resilient and highly-available deployment.

Failover considerations

Before you create a failover candidate node, it is important to consider your deployment architecture. A failover candidate node can help you maintain a resilient and highly available deployment by minimizing the downtime of your site if the central node fails. However, the failover candidate node provides failover capacity only for the Qlik Sense services running on the central node. If you want to create a highly available deployment, you must add resiliency to the storage layer as well.

Note: Each node in your multi-node site must meet the minimum system requirements. For a complete list, see System requirements.

Storage layer resiliency

If the storage components reside on the central node when it fails, they become unavailable because the failover candidate node does not provide failover for the storage components. You can add resiliency to your repository database and the file share by deploying them on a separate node from your central node. Other options to add resiliency are:

  • Deploy a standalone database on a virtual machine and take advantage of the resiliency options provided by the virtualization platform.

  • Host the file share in a network file location or a storage area network (SAN), or use resilient storage provided by a cloud platform.

For information about database replication and failover, see Database replication and failover.

Create a failover candidate node

When you create your multi-node site, you first create the central node and then you join additional nodes to the cluster. From the QMC, you can set one of these non-central nodes to be the failover candidate. The failover candidate will take over the responsibility of the central node if it fails. To set a failover candidate node, see Create a node.

Note: The failover candidate node can have different functions depending on your deployment. For example, the node that is designated to be the failover node can be the scheduler node in your multi-node site, as long as it has the required Qlik services to also be the failover candidate node.

Once you add more nodes to your site, you can assign one or more of them to be a failover candidate. For a node to be a failover candidate, it must run the following services:

  • Qlik Sense Repository Service
  • Qlik Sense Engine Service
  • Qlik Sense Proxy Service
  • Qlik Sense Scheduler Service

Automatic failover

In a multi-node site, each node regularly checks the central node for a heartbeat. If after 10 minutes (the default timeout period is 10 minutes) there is no response from the central node, the site will automatically fail over to the failover candidate node. If there is more than one failover candidate node, the first node to get a lock on the database field becomes the central node. If the node that was previously the central node comes back online, it becomes a failover candidate node.

You can view the status of the nodes that make up your multi-node site in the QMC. The default view does not include the node type, but you can customize the node information that is displayed. To see the node information, and to configure the information displayed about each node, see Nodes.

The default timeout period for the central node is 10 minutes, but you can change it in the QMC. To change the default timeout, see Cluster settings.

Failover candidate node with inbound and outbound ports

As mentioned above, the failover candidate can have different roles, depending on your organizational needs. The example below shows a multi-node site with a single failover candidate node that is running as the secondary scheduler in this site. The failover candidate must have the same inbound and outbound ports open as the central node. As the failover node acts as the secondary scheduler, therefore, port 5151 and 5050 must be open inbound and outbound on their respective nodes for scheduling jobs to the failover candidate node.

Tip: Inbound ports indicate the listening ports for the services running on each node. Firewall rules must allow inbound traffic to these ports. Outbound ports indicate the destination of the communication from one node to other nodes in the environment. Firewall rules must allow the node to send outbound traffic to these outbound ports.

User's web browser connects to Proxy node. The proxy node connects to Engine node, Failover candidate node, Central node, and Storage layer. All connections are two-way except for the Engine node and the Storage layer. These two connections are one-way connections from the Proxy node to the Engine node, and from the Proxy node to the Storage layer. Proxy node contains QPS, and QRS. Proxy node contains Inbound port rules where ports 80 (http), 443 (https), 4242 (QRS), and 4444 (QRS) are allowed. Proxy node also contains Outbound port rules where ports 4747 (Engine), 4239 (QRS websocket), 4242 (QRS), 4444 (QRS), 4899 (Printing), and 4949 (Data profiling) are allowed. Engine node contains QES and QRS. Engine node contains Inbound port rules where ports 4747 (QES), 4239 (QRS), 4242 (QRS), 4444 (QRS), and 4949 (Data profiling) are allowed. Engine node also contains Outbound port rules where ports 4242 (QRS) and 4444 (QRS) are allowed. Failover candidate node contains QES, QSS, QRS, and QPS. Failover candidate node contains Inbound port rules where the ports 4747 (QES), 5151 (QSS), 4242 (QRS), and 4444 (QRS) are allowed. Failover candidate node contains Outbound port rules where the ports 4242 (QRS), 4444 (QRS), and 5050 (QSS) are allowed. Failover candidate node has a two-way connection to Proxy node. Failover candidate node also has an Inbound port connection from Central node through port 5151 (QSS). Failover candidate node also has an Outbound port connection to Central node on port 5050 (QSS). Central node contains QES, QSS, QRS, and QPS. Central node contains Inbound port rules where the ports 4747 (QES), 5151 (QSS), 4242 (QRS), and 4444 (QRS) are allowed. Central node contains Outbound port rules where the ports 4242 (QRS), 4444 (QRS), and 5151 (QSS) are allowed. Central node has a two-way connection to Proxy node. Storage layer containers File share and QRD. Storage layer contains Inbound port rules where the port 4432 (QRD) is allowed. Storage layer contains Outbound port rules where the ports 4242 (QRS) and 4444 (QRS) are allowed. Storage layer has a one-way connection from Proxy node to itself.

For a complete list of inbound and outbound ports for all services, and to see more deployment examples, see Ports .

Manually migrating the central node

You cannot use the QMC to change which node in your site is the central node. You can, however, use the QRS REST API to do this. Before manually reassigning a failover candidate node to a central node role, you must ensure that it is running the necessary services for the central node.

Use the following REST API calls:

  • Run a GET to /qrs/serverNodeConfiguration to return a list of server GUIDs.
  • Run an empty POST to /qrs/failover/tonode/{serverNodeConfigurationID} where {serverNodeConfigurationID} is the ID of the node you want to become the central node.