Skip to main content Skip to complementary content

Configuring tolerations rules for Dynamic Engine

Use tolerations in Dynamic Engine Helm charts to allow pods to be scheduled on tainted Kubernetes nodes.

Starting with Dynamic Engine v1.4.0, you can set toleration rules in Helm values files to control which tainted nodes are eligible for pod scheduling. Apply a uniform rule to all pods, or configure separate rules for infrastructure service pods and customer workload pods such as Data Integration Jobs, Data Service Jobs, and Routes.

Information noteNote: You can apply this customization during initial chart installation or an upgrade.

Before you begin

  • The dynamic-engine-crd custom resource definitions must have been installed using the oci://ghcr.io/talend/helm/dynamic-engine-crd Helm chart. If not, run the following commands for the installation:
    1. Find the chart version to be used:
      • Run the following Helm command:
        helm show chart oci://ghcr.io/talend/helm/dynamic-engine-crd --version <engine_version>
      • See the version directly from Talend Management Console or check the Dynamic Engine changelog for the chart version included in your Dynamic Engine version.
      • Use an API call to the Dynamic Engine version endpoint.
    2. Run the following command to install the Helm chart of a given version:
      helm install dynamic-engine-crd oci://ghcr.io/talend/helm/dynamic-engine-crd --version <helm_chart_version>
      Replace <helm_chart_version> with the chart version supported by your Dynamic Engine version.

      Without specifying the version, you install the latest available dynamic-engine-crd chart version.

  • Your Kubernetes nodes must be tainted with the values you intend to tolerate.
  • You must have basic knowledge of Kubernetes taints and tolerations.

About this task

By default, Dynamic Engine pods are not scheduled on tainted nodes unless you configure a matching toleration rule. A taint on a node prevents the scheduler from placing pods that lack a matching toleration. Use a uniform global rule when all pods can share the same tainted node pool. Configure separate workload-specific rules when infrastructure service pods and customer workload pods must be scheduled on different tainted nodes.

Procedure

  1. In the Kubernetes machine, unzip the Helm deployment zip file previously downloaded.
  2. Create a custom values file with global.tolerations to allow all pods to be scheduled on nodes with the target taint.

    Example

    In this example, nodes are tainted with dedicated=dynamic-engine:NoSchedule. The following values file creates a toleration that matches this taint:
    cat <<EOF > custom-tolerations-values.yaml
    global:
      tolerations:
        - key: "dedicated"
          operator: "Equal"
          value: "dynamic-engine"
          effect: "NoSchedule"
    EOF
  3. Deploy Dynamic Engine and the Dynamic Engine environment with the global toleration rule.

    Example

    helm upgrade --install dynamic-engine-$DYNAMIC_ENGINE_ID \
      -f $DYNAMIC_ENGINE_ID-values.yaml \
      -f custom-tolerations-values.yaml \
      oci://ghcr.io/talend/helm/dynamic-engine \
      --version $DYNAMIC_ENGINE_VERSION
    
    helm upgrade --install dynamic-engine-environment-$DYNAMIC_ENGINE_ENVIRONMENT_ID \
      -f $DYNAMIC_ENGINE_ENVIRONMENT_ID-values.yaml \
      -f custom-tolerations-values.yaml \
      oci://ghcr.io/talend/helm/dynamic-engine-environment \
      --version $DYNAMIC_ENGINE_VERSION

    After deployment, pods in both the qlik-dynamic-engine and qlik-processing-env-$DYNAMIC_ENGINE_ENVIRONMENT_ID namespaces can be scheduled on nodes tainted with dedicated=dynamic-engine:NoSchedule.

To configure separate toleration rules for infrastructure service pods and customer workload pods, follow steps 4 and 5 instead of steps 2 and 3.

  1. Create separate toleration rules for infrastructure service pods and customer workload pods.
    Toleration rules set at global.tolerations apply to all pods created by the chart. Rules set at configuration.jobDeployment.tolerations and configuration.dataServiceRouteDeployment.tolerations apply only to customer workload pods. These include Data Integration Jobs, Data Service Jobs, and Routes. If you define workload-specific rules, they override the global rules for those pod types only. If you do not define workload-specific rules, all pods use the global toleration rules.

    Example

    In this example, infrastructure service pods tolerate dedicated=dynamic-engine:NoSchedule, and customer workload pods tolerate dedicated=dynamic-engine-jobs-and-routes:NoSchedule. Create two custom values files:
    cat <<EOF > custom-tolerations-values.yaml
    global:
      tolerations:
        - key: "dedicated"
          operator: "Equal"
          value: "dynamic-engine"
          effect: "NoSchedule"
    EOF
    
    cat <<EOF > custom-jobDeployment-dataServiceRouteDeployment-tolerations-values.yaml
    configuration:
      jobDeployment:
        tolerations:
          - key: "dedicated"
            operator: "Equal"
            value: "dynamic-engine-jobs-and-routes"
            effect: "NoSchedule"
      dataServiceRouteDeployment:
        tolerations:
          - key: "dedicated"
            operator: "Equal"
            value: "dynamic-engine-jobs-and-routes"
            effect: "NoSchedule"
    EOF
  2. Deploy the charts with the split toleration rules.

    Example

    Apply the global toleration file to both charts. Apply the workload-specific file only to the environment chart:
    helm upgrade --install dynamic-engine-$DYNAMIC_ENGINE_ID \
      -f $DYNAMIC_ENGINE_ID-values.yaml \
      -f custom-tolerations-values.yaml \
      oci://ghcr.io/talend/helm/dynamic-engine \
      --version $DYNAMIC_ENGINE_VERSION
    
    helm upgrade --install dynamic-engine-environment-$DYNAMIC_ENGINE_ENVIRONMENT_ID \
      -f $DYNAMIC_ENGINE_ENVIRONMENT_ID-values.yaml \
      -f custom-tolerations-values.yaml \
      -f custom-jobDeployment-dataServiceRouteDeployment-tolerations-values.yaml \
      oci://ghcr.io/talend/helm/dynamic-engine-environment \
      --version $DYNAMIC_ENGINE_VERSION

    After deployment, infrastructure service pods in both the qlik-dynamic-engine and qlik-processing-env-$DYNAMIC_ENGINE_ENVIRONMENT_ID namespaces can be scheduled on nodes tainted with dedicated=dynamic-engine:NoSchedule. Customer workload pods in the qlik-processing-env-$DYNAMIC_ENGINE_ENVIRONMENT_ID namespace can be scheduled on nodes tainted with dedicated=dynamic-engine-jobs-and-routes:NoSchedule.

Results

After you deploy the charts successfully:

  • Dynamic Engine services in the qlik-dynamic-engine namespace can be scheduled on nodes tainted with dedicated=dynamic-engine:NoSchedule.
  • Dynamic Engine environment services in the qlik-processing-env-$DYNAMIC_ENGINE_ENVIRONMENT_ID namespace can be scheduled on nodes tainted with dedicated=dynamic-engine:NoSchedule.
  • When using separate toleration rules, customer workload pods (Data Integration Jobs, Data Service Jobs, and Route pods) in the qlik-processing-env-$DYNAMIC_ENGINE_ENVIRONMENT_ID namespace can be scheduled on nodes tainted with dedicated=dynamic-engine-jobs-and-routes:NoSchedule.
Information noteNote: You can combine tolerations with nodeSelector and affinity rules to control pod placement by using both node labels and taints. See Configuring affinity and nodeSelector rules for Dynamic Engine.
Troubleshooting:

If pods stay in Pending status after you add tolerations, check for a mismatch between node taints and pod tolerations.

Common situations include:

  • The node is tainted with dedicated=dynamic-engine:NoSchedule, but the pod has no toleration for the dedicated key.
  • The pod toleration uses the wrong value, operator, or effect, so it does not match the taint exactly.
  • The node uses NoExecute, and the pod only tolerates NoSchedule, so the scheduler can still reject or evict the pod.

Use the following commands to inspect the problem:

  1. Inspect the pod for scheduling failure details:

    kubectl describe pod <pod-name> -n <namespace>

    Look for the Events and Tolerations sections. A failed scheduling event names the specific taint that was not tolerated.

  2. Find scheduling failure events sorted by time:

    kubectl get events -n <namespace> --sort-by=.lastTimestamp

    Look for FailedScheduling messages that reference taints with no matching toleration.

  3. Compare node taints against pod tolerations:

    kubectl describe node <node-name>

    Check the Taints section. The taint key, value, operator, and effect must all match the pod toleration exactly.

  4. Identify which node a pod landed on:

    kubectl get pods -n <namespace> -o wide

    Use this command to see the node assignment, then inspect that node with kubectl describe node.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!