Configuring tolerations rules for Dynamic Engine
Use tolerations in Dynamic Engine Helm charts to allow pods to be scheduled on tainted Kubernetes nodes.
Starting with Dynamic Engine v1.4.0, you can set toleration rules in Helm values files to control which tainted nodes are eligible for pod scheduling. Apply a uniform rule to all pods, or configure separate rules for infrastructure service pods and customer workload pods such as Data Integration Jobs, Data Service Jobs, and Routes.
Before you begin
- The dynamic-engine-crd custom resource definitions must have been installed using the oci://ghcr.io/talend/helm/dynamic-engine-crd Helm chart. If not, run the following commands for the installation:
- Find the chart version to be used:
- Run the following Helm command:
helm show chart oci://ghcr.io/talend/helm/dynamic-engine-crd --version <engine_version> - See the version directly from Talend Management Console or check the Dynamic Engine changelog for the chart version included in your Dynamic Engine version.
- Use an API call to the Dynamic Engine version endpoint.
- Run the following Helm command:
- Run the following command to install the Helm
chart of a given version:Replace <helm_chart_version> with the chart version supported by your Dynamic Engine version.
helm install dynamic-engine-crd oci://ghcr.io/talend/helm/dynamic-engine-crd --version <helm_chart_version>Without specifying the version, you install the latest available dynamic-engine-crd chart version.
- Find the chart version to be used:
- Your Kubernetes nodes must be tainted with the values you intend to tolerate.
- You must have basic knowledge of Kubernetes taints and tolerations.
About this task
By default, Dynamic Engine pods are not scheduled on tainted nodes unless you configure a matching toleration rule. A taint on a node prevents the scheduler from placing pods that lack a matching toleration. Use a uniform global rule when all pods can share the same tainted node pool. Configure separate workload-specific rules when infrastructure service pods and customer workload pods must be scheduled on different tainted nodes.
Procedure
To configure separate toleration rules for infrastructure service pods and customer workload pods, follow steps 4 and 5 instead of steps 2 and 3.
Results
After you deploy the charts successfully:
- Dynamic Engine services in the qlik-dynamic-engine namespace can be scheduled on nodes tainted with dedicated=dynamic-engine:NoSchedule.
- Dynamic Engine environment services in the qlik-processing-env-$DYNAMIC_ENGINE_ENVIRONMENT_ID namespace can be scheduled on nodes tainted with dedicated=dynamic-engine:NoSchedule.
- When using separate toleration rules, customer workload pods (Data Integration Jobs, Data Service Jobs, and Route pods) in the qlik-processing-env-$DYNAMIC_ENGINE_ENVIRONMENT_ID namespace can be scheduled on nodes tainted with dedicated=dynamic-engine-jobs-and-routes:NoSchedule.
If pods stay in Pending status after you add tolerations, check for a mismatch between node taints and pod tolerations.
Common situations include:
- The node is tainted with dedicated=dynamic-engine:NoSchedule, but the pod has no toleration for the dedicated key.
- The pod toleration uses the wrong value, operator, or effect, so it does not match the taint exactly.
- The node uses NoExecute, and the pod only tolerates NoSchedule, so the scheduler can still reject or evict the pod.
Use the following commands to inspect the problem:
-
Inspect the pod for scheduling failure details:
kubectl describe pod <pod-name> -n <namespace>Look for the Events and Tolerations sections. A failed scheduling event names the specific taint that was not tolerated.
-
Find scheduling failure events sorted by time:
kubectl get events -n <namespace> --sort-by=.lastTimestampLook for FailedScheduling messages that reference taints with no matching toleration.
-
Compare node taints against pod tolerations:
kubectl describe node <node-name>Check the Taints section. The taint key, value, operator, and effect must all match the pod toleration exactly.
-
Identify which node a pod landed on:
kubectl get pods -n <namespace> -o wideUse this command to see the node assignment, then inspect that node with kubectl describe node.