Skip to main content Skip to complementary content

Setting up Amazon EKS with S3

Before deploying Dynamic Engine and its environments, set up an Amazon EKS cluster with Amazon S3 as the shared storage backend.

Before you begin

About this task

This procedure creates a new EKS cluster in dedicated virtual private cloud (Amazon VPC), installs the S3 CSI (Container Storage Interface) driver as an AWS-managed EKS add-on, creates an S3 bucket to back persistent volumes, and provisions static PersistentVolumes and PersistentVolumeClaims for Dynamic Engine and its environments.
  • Unlike the Amazon EFS setup, S3 uses static provisioning: there is no StorageClass resource. You must create PersistentVolumes manually and bind them to named PersistentVolumeClaims before deploying Dynamic Engine.

Procedure

  1. Set environment variables for your Amazon EKS and S3 deployment:
    export AWS_REGION=<your-aws-region>
    export EKS_CLUSTER_NAME=<your-eks-cluster-name>
    export S3_BUCKET_NAME=<your-s3-bucket-name>
  2. Create the S3 bucket to back the persistent volumes. Select the bucket type that matches your requirements.
    The bucket to be created and the Amazon VPC must be in the same AWS region.
    • General-purpose bucket: Use a standard S3 bucket available in all AWS regions. One bucket serves all Dynamic Engine volumes using path prefixes.

      cat <<EOF > $S3_BUCKET_NAME-classic-input.yaml
      Bucket: $S3_BUCKET_NAME
      CreateBucketConfiguration:
        LocationConstraint: ${AWS_REGION}
        Tags:
          - Key: owner
            Value: dyn-engine
          - Key: creator
            Value: ${USER}
          - Key: eks/cluster
            Value: ${EKS_CLUSTER_NAME}
      EOF
      
      S3_BUCKET_ARN=$(aws s3api create-bucket \
        --cli-input-yaml file://$S3_BUCKET_NAME-classic-input.yaml \
        --region $AWS_REGION \
        --output text \
        --query BucketArn)
      
      echo "S3 bucket created: $S3_BUCKET_ARN"
      
      The three blocks generate a configuration file for a new S3 bucket with ownership tags, create the bucket using the AWS CLI, and print a confirmation message with the bucket's unique identifier.
    • Directory bucket: Use a directory bucket (also referred to as S3 Express One Zone bucket) for reduced latency within a single Availability Zone. Directory buckets require a specific naming format: bucket-base-name--zone-id--x-s3, where zone-id is an Availability Zone or Local Zone ID. This type of buckets is only available in specific AWS regions and Availability Zones.

      Information noteNote: Set AWS_AZ_ZONE_ID to the zone ID for your target Availability Zone before running this script. For example, euw1-az1 is for the eu-west-1 region.
      AWS_AZ_ZONE_ID=<your-az-zone-id>
      S3_BUCKET_NAME="${EKS_CLUSTER_NAME}--$AWS_AZ_ZONE_ID--x-s3"
      
      cat <<EOF > $S3_BUCKET_NAME-directory-input.yaml
      Bucket: $S3_BUCKET_NAME
      CreateBucketConfiguration:
        Location:
          Type: AvailabilityZone
          Name: $AWS_AZ_ZONE_ID
        Bucket:
          DataRedundancy: SingleAvailabilityZone
          Type: Directory
        Tags:
          - Key: owner
            Value: dyn-engine
          - Key: creator
            Value: ${USER}
          - Key: eks/cluster
            Value: ${EKS_CLUSTER_NAME}
      EOF
      
      S3_BUCKET_ARN=$(aws s3api create-bucket \
        --cli-input-yaml file://$S3_BUCKET_NAME-directory-input.yaml \
        --region $AWS_REGION \
        --output text \
        --query BucketArn)
      
      echo "S3 bucket created: $S3_BUCKET_ARN"
      These four blocks set up and create an S3 Express One Zone (directory) bucket pinned to a specific Availability Zone, using AWS-required naming conventions and configuration, then print a confirmation message.
  3. Create the EKS cluster configuration file.
    IAM permissions differ between general-purpose and directory buckets. Use the configuration matching the bucket type you created in the previous step:
    • General-purpose bucket: grant s3:ListBucket, s3:GetObject, s3:PutObject, and s3:DeleteObject on the bucket and its objects:

      cat <<EOF > eks-config.yaml
      apiVersion: eksctl.io/v1alpha5
      kind: ClusterConfig
      
      metadata:
        name: ${EKS_CLUSTER_NAME}
        region: ${AWS_REGION}
        version: "1.34"
      
      autoModeConfig:
        enabled: false
      
      addonsConfig:
        autoApplyPodIdentityAssociations: true
      
      addons:
        - name: eks-pod-identity-agent
        - name: aws-mountpoint-s3-csi-driver
          namespaceConfig:
            namespace: aws-mountpoint-s3
          podIdentityAssociations:
            - namespace: aws-mountpoint-s3
              serviceAccountName: s3-csi-driver-sa
              permissionPolicy:
                Statement:
                  - Effect: Allow
                    Action:
                      - s3:ListBucket
                    Resource: $S3_BUCKET_ARN
                  - Effect: Allow
                    Action:
                      - s3:GetObject
                      - s3:PutObject
                      - s3:DeleteObject
                    Resource: "$S3_BUCKET_ARN/*"
      
      vpc:
        cidr: 10.10.0.0/16
        autoAllocateIPv6: false
        hostnameType: resource-name
        clusterEndpoints:
          publicAccess: true
          privateAccess: true
        nat:
          gateway: HighlyAvailable
      
      managedNodeGroups:
        - name: ng-amd64
          amiFamily: AmazonLinux2023 # (default) or Bottlerocket
          minSize: 2
          maxSize: 4
          desiredCapacity: 2
          instanceSelector:
            vCPUs: 4
            memory: 16GiB
            cpuArchitecture: amd64
          privateNetworking: true
      
      EOF
    • Directory bucket: grant s3express:CreateSession on the bucket (S3 Express One Zone session-based authorization):

      cat <<EOF > eks-config.yaml
      apiVersion: eksctl.io/v1alpha5
      kind: ClusterConfig
      
      metadata:
        name: ${EKS_CLUSTER_NAME}
        region: ${AWS_REGION}
        version: "1.34"
      
      autoModeConfig:
        enabled: false
      
      addonsConfig:
        autoApplyPodIdentityAssociations: true
      
      addons:
        - name: eks-pod-identity-agent
        - name: aws-mountpoint-s3-csi-driver
          namespaceConfig:
            namespace: aws-mountpoint-s3
          podIdentityAssociations:
            - namespace: aws-mountpoint-s3
              serviceAccountName: s3-csi-driver-sa
              permissionPolicy:
                Statement:
                  - Effect: Allow
                    Action:
                      - s3express:CreateSession
                    Resource: $S3_BUCKET_ARN
      
      vpc:
        cidr: 10.10.0.0/16
        autoAllocateIPv6: false
        hostnameType: resource-name
        clusterEndpoints:
          publicAccess: true
          privateAccess: true
        nat:
          gateway: HighlyAvailable
      
      managedNodeGroups:
        - name: ng-amd64
          amiFamily: AmazonLinux2023 # (default) or Bottlerocket
          minSize: 2
          maxSize: 4
          desiredCapacity: 2
          instanceSelector:
            vCPUs: 4
            memory: 16GiB
            cpuArchitecture: amd64
          privateNetworking: true
      
      EOF
    Information noteNote: The following configuration notes apply to both general-purpose and directory bucket variants. The only difference between them is the IAM permission policy granted to the S3 CSI driver.

    This configuration:

    • Sets the Kubernetes version to 1.34. This is the example version used in this procedure. Update it to match your target cluster version. For supported versions, see Prerequisites to use Dynamic Engine.
    • Disables EKS Auto Mode (autoModeConfig: enabled: false) to ensure standard node group management.
    • Installs the aws-mountpoint-s3-csi-driver add-on in the aws-mountpoint-s3 namespace, with IAM permissions configured automatically using Pod Identity Associations. This grants the S3 CSI driver the permissions it needs to mount S3 objects as volumes without hard-coded credentials. Unlike EFS, which uses an EKS-managed policy, the S3 CSI driver requires an explicit IAM policy scoped to the specific S3 bucket ARN ($S3_BUCKET_ARN). The policy is bound to the add-on automatically by eksctl; no additional AWS CLI steps are required.
    • Creates a VPC and places all nodes in private subnets (privateNetworking: true). Nodes reach the internet only through NAT gateways. The cluster endpoint remains accessible from both your local machine and within the VPC.
    • Provisions nodes sized for Dynamic Engine workloads. Each node has 4 vCPUs and 16 GiB of RAM. The node group scales between 2 and 4 nodes (minSize: 2, maxSize: 4). The AMD64 architecture (cpuArchitecture: amd64) is required because Dynamic Engine container images are built for AMD64 only.
  4. Create the EKS cluster, link your local kubectl to the cluster, and verify access:
    eksctl create cluster -f eks-config.yaml
    aws eks update-kubeconfig --region "$AWS_REGION" --name "$EKS_CLUSTER_NAME"
    kubectl get nodes

    Wait for all nodes to reach Ready status before proceeding.

  5. Set environment variables for the Dynamic Engine environment namespace and create the three PersistentVolumes and PersistentVolumeClaims to be used by Dynamic Engine services as well as your Talend Management Console tasks:
    ENV_ID=67f7d562ffd7c3525a902542
    DYNAMIC_ENGINE_ENV_NAMESPACE=qlik-processing-env-$ENV_ID
    # Ensure the target namespace exists before creating PVs/PVCs
    kubectl get namespace "$DYNAMIC_ENGINE_ENV_NAMESPACE" >/dev/null 2>&1 || kubectl create namespace "$DYNAMIC_ENGINE_ENV_NAMESPACE"
    declare -a pvcs=("archive" "job-data" "custom-resources")
    
    for pvc in "${pvcs[@]}"
    do
      echo "Creating PV/PVC for $pvc"
    cat <<EOF | kubectl apply -f -
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: $pvc
    spec:
      capacity:
        storage: 1200Gi # Ignored for S3 volumes but required by Kubernetes API
      accessModes:
        - ReadWriteMany # Supported options: ReadWriteMany / ReadOnlyMany
      storageClassName: "" # Required for static provisioning
      claimRef: # To ensure no other PVCs can claim this PV
        namespace: $DYNAMIC_ENGINE_ENV_NAMESPACE # Namespace is required even though it's in "default" namespace.
        name: $pvc # Name of your PVC
      mountOptions:
        - allow-delete
        - allow-overwrite
        - region $AWS_REGION
        - prefix $pvc/
      csi:
        driver: s3.csi.aws.com # Required
        volumeHandle: $pvc # Must be unique
        volumeAttributes:
          bucketName: $S3_BUCKET_NAME
    ---
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: $pvc
      namespace: $DYNAMIC_ENGINE_ENV_NAMESPACE
    spec:
      accessModes:
        - ReadWriteMany # Supported options: ReadWriteMany / ReadOnlyMany
      storageClassName: "" # Required for static provisioning
      resources:
        requests:
          storage: 1200Gi # Ignored, required
      volumeName: $pvc # Name of your PV
    EOF
    • storageClassName: "": Must be empty for static provisioning.
    • claimRef: Binds this PersistentVolume exclusively to the named PersistentVolumeClaim in the specified namespace. Required to prevent unintended claim by other workloads.
    • mountOptions: allow-delete and allow-overwrite. They allow write operations on S3-backed volumes. Without these flags, pods can mount the volume but writes will fail.
    • prefix: $pvc/: Scopes each volume to a separate path prefix within the shared S3 bucket, isolating the data for each volume.
    • volumeHandle: Must be unique across all PersistentVolumes in the cluster.
  6. Create a test pod that mounts all three PersistentVolumeClaims:
    cat <<EOF | kubectl apply -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: test-pod-pvc
      namespace: $DYNAMIC_ENGINE_ENV_NAMESPACE
    spec:
      securityContext:
        fsGroup: 61000
        runAsUser: 61000
        runAsGroup: 61000
        runAsNonRoot: true
      containers:
        - name: app
          image: ghcr.io/talend/kube-base:5.3.0
          command: ["cat"]
          tty: true
          volumeMounts:
            - name: archive
              mountPath: /opt/talend/archive
            - name: job-data
              mountPath: /opt/talend/data
            - name: custom-resources
              mountPath: /opt/talend/custom-resources
      volumes:
        - name: archive
          persistentVolumeClaim:
            claimName: archive
        - name: job-data
          persistentVolumeClaim:
            claimName: job-data
        - name: custom-resources
          persistentVolumeClaim:
            claimName: custom-resources
    EOF

    The test pod uses runAsUser: 61000, runAsGroup: 61000, and fsGroup: 61000. These values match the Dynamic Engine runtime UID/GID and are required for correct file ownership on the mounted volumes.

  7. Validate that the pod can read and write data to the mounted volumes:
    kubectl exec -n $DYNAMIC_ENGINE_ENV_NAMESPACE test-pod-pvc -- \
      sh -c "echo 'Hello world' > /opt/talend/archive/test.txt \
      && cat /opt/talend/archive/test.txt \
      && rm -f /opt/talend/archive/test.txt"

    If the command completes without error, the S3-backed volume is accessible and writable.

Results

Your EKS cluster is running with the S3 CSI driver installed, and the three PersistentVolumes (archive, job-data, custom-resources) are created and bound to their PersistentVolumeClaims in the qlik-processing-env-<env-id> namespace. The cluster is ready for Dynamic Engine deployment.
Information noteTip: The embedded docker-registry service of Dynamic Engine performs random writes, which the S3 CSI driver (Mountpoint for Amazon S3) does not support. As a result, the docker-registry volume cannot use an S3-backed PersistentVolumeClaim. Use EFS or other external systems that support POSIX (Portable Operating System Interface) to provision persistent volumes, which can be dynamic or static, for docker-registry.

What to do next

Configure and deploy the Dynamic Engine Helm charts to reference the PersistentVolumeClaims you created in this procedure. For detailed instructions on using existing PVCs, see Deploying Dynamic Engine with existing PersistentVolumeClaims.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!