Skip to main content Skip to complementary content

Step 2: Create a lakehouse cluster

A lakehouse cluster defines the compute environment to run Qlik Open Lakehouse storage tasks. Each cluster specifies settings that include the number of instances, machine type, and scaling strategy.

When you create a network integration for a Qlik Open Lakehouse pipeline project, a cluster with a single AWS Spot Instance is created automatically. However, you can create additional clusters in the Administration and Data Integration activity centers.

Lakehouse clusters link pipelines to a group of AWS instances, allowing you to optimize workloads by assigning critical jobs to high-performance clusters, and non-critical workloads to cost-effective machines.

While a cluster is associated with a single VPC, multiple clusters can run within the same VPC. Additionally, a single cluster can run multiple jobs. It is helpful to define the compute requirements of your workloads before creating a lakehouse cluster. Cluster settings, including the scaling strategy, can be modified as needed, although some changes may require the cluster to be rolled. For more information on editing cluster settings, see Managing lakehouse clusters

When you create a lakehouse cluster, you specify the number of Spot and On-Demand instances that Qlik provisions. For more information on how Qlik utilizes Spot and On-Demand instances in your cluster, see Lakehouse cluster (EC2 Auto-Scaling Group)

Prerequisites

To create a lakehouse cluster, you need:

  • A network integration within the current tenant.

  • Permission to access the network integration.

Creating a lakehouse cluster

To add a cluster to the current tenant, do the following:

  1. In the Administration activity center, click Lakehouse clusters. Select the Lakehouse clusters tab, click Create new, then Lakehouse cluster, and configure it:

    • Name: Enter the name of the cluster.

    • Network integration: Select the network integration where the cluster will be deployed.

    • Integration space: Select the space that the cluster will belong to, as this is not inherited from the network integration.

    • Family type: Select the instance family type.

  2. Configure the instances:
    • AWS On-Demand Instances: Enter the number of AWS On-Demand Instances for this cluster.

    • AWS Spot Instances: Enter the Minimum and Maximum number of Spot Instances to use.

  3. Choose an appropriate strategy for your workload from the following options:
    • Low cost – Optimizes for low cost, though may lead to occasional periods of high latency.

    • Low latency - Strives to maintain low latency, while allowing brief, necessary spikes.

    • Consistent low latency - Proactively scales up to ensure latency remains low.

    • Manual scaling - Retains a static number of instances with no automatic scaling.

  4. Select how your cluster receives software updates:

    • Early rollout: Ideal for development and staging clusters to validate new releases against custom set-ups and code, prior to production.

    • Later rollout: Updates are applied after a successful early rollout, and recommended for production environments.

  5. Add a Key and Value for any tags you want to include that help you identify, organize, and manage resources.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!