Setting up Amazon EKS with S3

Before deploying Dynamic Engine and its environments, set up an Amazon EKS cluster with Amazon S3 as the shared storage backend.

Before you begin

An AWS account with the required IAM permissions. For the full prerequisite list, see Configuring Amazon EKS for Dynamic Engine deployment.

About this task

This procedure creates a new EKS cluster in dedicated virtual private cloud (Amazon VPC), installs the S3 CSI (Container Storage Interface) driver as an AWS-managed EKS add-on, creates an S3 bucket to back persistent volumes, and provisions static PersistentVolumes and PersistentVolumeClaims for Dynamic Engine and its environments.

Unlike the Amazon EFS setup, S3 uses static provisioning: there is no StorageClass resource. You must create PersistentVolumes manually and bind them to named PersistentVolumeClaims before deploying Dynamic Engine.

Procedure

Set environment variables for your Amazon EKS and S3 deployment:

export AWS_REGION=<your-aws-region>
export EKS_CLUSTER_NAME=<your-eks-cluster-name>
export S3_BUCKET_NAME=<your-s3-bucket-name>

Create the S3 bucket to back the persistent volumes. Select the bucket type that matches your requirements.
The bucket to be created and the Amazon VPC must be in the same AWS region.
- General-purpose bucket: Use a standard S3 bucket available in all AWS regions. One bucket serves all Dynamic Engine volumes using path prefixes.
```
cat <<EOF > $S3_BUCKET_NAME-classic-input.yaml
Bucket: $S3_BUCKET_NAME
CreateBucketConfiguration:
  LocationConstraint: ${AWS_REGION}
  Tags:
    - Key: owner
      Value: dyn-engine
    - Key: creator
      Value: ${USER}
    - Key: eks/cluster
      Value: ${EKS_CLUSTER_NAME}
EOF

S3_BUCKET_ARN=$(aws s3api create-bucket \
  --cli-input-yaml file://$S3_BUCKET_NAME-classic-input.yaml \
  --region $AWS_REGION \
  --output text \
  --query BucketArn)

echo "S3 bucket created: $S3_BUCKET_ARN"
```
  The three blocks generate a configuration file for a new S3 bucket with ownership tags, create the bucket using the AWS CLI, and print a confirmation message with the bucket's unique identifier.
- Directory bucket: Use a directory bucket (also referred to as S3 Express One Zone bucket) for reduced latency within a single Availability Zone. Directory buckets require a specific naming format: bucket-base-name--zone-id--x-s3, where zone-id is an Availability Zone or Local Zone ID. This type of buckets is only available in specific AWS regions and Availability Zones.
  
  Information noteNote: Set AWS_AZ_ZONE_ID to the zone ID for your target Availability Zone before running this script. For example, euw1-az1 is for the eu-west-1 region.
```
AWS_AZ_ZONE_ID=<your-az-zone-id>
S3_BUCKET_NAME="${EKS_CLUSTER_NAME}--$AWS_AZ_ZONE_ID--x-s3"

cat <<EOF > $S3_BUCKET_NAME-directory-input.yaml
Bucket: $S3_BUCKET_NAME
CreateBucketConfiguration:
  Location:
    Type: AvailabilityZone
    Name: $AWS_AZ_ZONE_ID
  Bucket:
    DataRedundancy: SingleAvailabilityZone
    Type: Directory
  Tags:
    - Key: owner
      Value: dyn-engine
    - Key: creator
      Value: ${USER}
    - Key: eks/cluster
      Value: ${EKS_CLUSTER_NAME}
EOF

S3_BUCKET_ARN=$(aws s3api create-bucket \
  --cli-input-yaml file://$S3_BUCKET_NAME-directory-input.yaml \
  --region $AWS_REGION \
  --output text \
  --query BucketArn)

echo "S3 bucket created: $S3_BUCKET_ARN"
```
  These four blocks set up and create an S3 Express One Zone (directory) bucket pinned to a specific Availability Zone, using AWS-required naming conventions and configuration, then print a confirmation message.

Create the EKS cluster configuration file.

IAM permissions differ between general-purpose and directory buckets. Use the configuration matching the bucket type you created in the previous step:

General-purpose bucket: grant s3:ListBucket, s3:GetObject, s3:PutObject, and s3:DeleteObject on the bucket and its objects:

cat <<EOF > eks-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: ${EKS_CLUSTER_NAME}
  region: ${AWS_REGION}
  version: "1.34"

autoModeConfig:
  enabled: false

addonsConfig:
  autoApplyPodIdentityAssociations: true

addons:
  - name: eks-pod-identity-agent
  - name: aws-mountpoint-s3-csi-driver
    namespaceConfig:
      namespace: aws-mountpoint-s3
    podIdentityAssociations:
      - namespace: aws-mountpoint-s3
        serviceAccountName: s3-csi-driver-sa
        permissionPolicy:
          Statement:
            - Effect: Allow
              Action:
                - s3:ListBucket
              Resource: $S3_BUCKET_ARN
            - Effect: Allow
              Action:
                - s3:GetObject
                - s3:PutObject
                - s3:DeleteObject
              Resource: "$S3_BUCKET_ARN/*"

vpc:
  cidr: 10.10.0.0/16
  autoAllocateIPv6: false
  hostnameType: resource-name
  clusterEndpoints:
    publicAccess: true
    privateAccess: true
  nat:
    gateway: HighlyAvailable

managedNodeGroups:
  - name: ng-amd64
    amiFamily: AmazonLinux2023 # (default) or Bottlerocket
    minSize: 2
    maxSize: 4
    desiredCapacity: 2
    instanceSelector:
      vCPUs: 4
      memory: 16GiB
      cpuArchitecture: amd64
    privateNetworking: true

EOF

Directory bucket: grant s3express:CreateSession on the bucket (S3 Express One Zone session-based authorization):

cat <<EOF > eks-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: ${EKS_CLUSTER_NAME}
  region: ${AWS_REGION}
  version: "1.34"

autoModeConfig:
  enabled: false

addonsConfig:
  autoApplyPodIdentityAssociations: true

addons:
  - name: eks-pod-identity-agent
  - name: aws-mountpoint-s3-csi-driver
    namespaceConfig:
      namespace: aws-mountpoint-s3
    podIdentityAssociations:
      - namespace: aws-mountpoint-s3
        serviceAccountName: s3-csi-driver-sa
        permissionPolicy:
          Statement:
            - Effect: Allow
              Action:
                - s3express:CreateSession
              Resource: $S3_BUCKET_ARN

vpc:
  cidr: 10.10.0.0/16
  autoAllocateIPv6: false
  hostnameType: resource-name
  clusterEndpoints:
    publicAccess: true
    privateAccess: true
  nat:
    gateway: HighlyAvailable

managedNodeGroups:
  - name: ng-amd64
    amiFamily: AmazonLinux2023 # (default) or Bottlerocket
    minSize: 2
    maxSize: 4
    desiredCapacity: 2
    instanceSelector:
      vCPUs: 4
      memory: 16GiB
      cpuArchitecture: amd64
    privateNetworking: true

EOF

Note: The following configuration notes apply to both general-purpose and directory bucket variants. The only difference between them is the IAM permission policy granted to the S3 CSI driver.

This configuration:

Sets the Kubernetes version to 1.34. This is the example version used in this procedure. Update it to match your target cluster version. For supported versions, see Prerequisites to use Dynamic Engine.
Disables EKS Auto Mode (autoModeConfig: enabled: false) to ensure standard node group management.
Installs the aws-mountpoint-s3-csi-driver add-on in the aws-mountpoint-s3 namespace, with IAM permissions configured automatically using Pod Identity Associations. This grants the S3 CSI driver the permissions it needs to mount S3 objects as volumes without hard-coded credentials. Unlike EFS, which uses an EKS-managed policy, the S3 CSI driver requires an explicit IAM policy scoped to the specific S3 bucket ARN ($S3_BUCKET_ARN). The policy is bound to the add-on automatically by eksctl; no additional AWS CLI steps are required.
Creates a VPC and places all nodes in private subnets (privateNetworking: true). Nodes reach the internet only through NAT gateways. The cluster endpoint remains accessible from both your local machine and within the VPC.
Provisions nodes sized for Dynamic Engine workloads. Each node has 4 vCPUs and 16 GiB of RAM. The node group scales between 2 and 4 nodes (minSize: 2, maxSize: 4). The AMD64 architecture (cpuArchitecture: amd64) is required because Dynamic Engine container images are built for AMD64 only.

Create the EKS cluster, link your local kubectl to the cluster, and verify access:
```
eksctl create cluster -f eks-config.yaml
aws eks update-kubeconfig --region "$AWS_REGION" --name "$EKS_CLUSTER_NAME"
kubectl get nodes
```
Wait for all nodes to reach Ready status before proceeding.

Set environment variables for the Dynamic Engine environment namespace and create the three PersistentVolumes and PersistentVolumeClaims to be used by Dynamic Engine services as well as your Talend Management Console tasks:

ENV_ID=67f7d562ffd7c3525a902542
DYNAMIC_ENGINE_ENV_NAMESPACE=qlik-processing-env-$ENV_ID
# Ensure the target namespace exists before creating PVs/PVCs
kubectl get namespace "$DYNAMIC_ENGINE_ENV_NAMESPACE" >/dev/null 2>&1 || kubectl create namespace "$DYNAMIC_ENGINE_ENV_NAMESPACE"
declare -a pvcs=("archive" "job-data" "custom-resources")

for pvc in "${pvcs[@]}"
do
  echo "Creating PV/PVC for $pvc"
cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: $pvc
spec:
  capacity:
    storage: 1200Gi # Ignored for S3 volumes but required by Kubernetes API
  accessModes:
    - ReadWriteMany # Supported options: ReadWriteMany / ReadOnlyMany
  storageClassName: "" # Required for static provisioning
  claimRef: # To ensure no other PVCs can claim this PV
    namespace: $DYNAMIC_ENGINE_ENV_NAMESPACE # Namespace is required even though it's in "default" namespace.
    name: $pvc # Name of your PVC
  mountOptions:
    - allow-delete
    - allow-overwrite
    - region $AWS_REGION
    - prefix $pvc/
  csi:
    driver: s3.csi.aws.com # Required
    volumeHandle: $pvc # Must be unique
    volumeAttributes:
      bucketName: $S3_BUCKET_NAME
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: $pvc
  namespace: $DYNAMIC_ENGINE_ENV_NAMESPACE
spec:
  accessModes:
    - ReadWriteMany # Supported options: ReadWriteMany / ReadOnlyMany
  storageClassName: "" # Required for static provisioning
  resources:
    requests:
      storage: 1200Gi # Ignored, required
  volumeName: $pvc # Name of your PV
EOF

storageClassName: "": Must be empty for static provisioning.
claimRef: Binds this PersistentVolume exclusively to the named PersistentVolumeClaim in the specified namespace. Required to prevent unintended claim by other workloads.
mountOptions: allow-delete and allow-overwrite. They allow write operations on S3-backed volumes. Without these flags, pods can mount the volume but writes will fail.
prefix: $pvc/: Scopes each volume to a separate path prefix within the shared S3 bucket, isolating the data for each volume.
volumeHandle: Must be unique across all PersistentVolumes in the cluster.

Create a test pod that mounts all three PersistentVolumeClaims:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: test-pod-pvc
  namespace: $DYNAMIC_ENGINE_ENV_NAMESPACE
spec:
  securityContext:
    fsGroup: 61000
    runAsUser: 61000
    runAsGroup: 61000
    runAsNonRoot: true
  containers:
    - name: app
      image: ghcr.io/talend/kube-base:5.3.0
      command: ["cat"]
      tty: true
      volumeMounts:
        - name: archive
          mountPath: /opt/talend/archive
        - name: job-data
          mountPath: /opt/talend/data
        - name: custom-resources
          mountPath: /opt/talend/custom-resources
  volumes:
    - name: archive
      persistentVolumeClaim:
        claimName: archive
    - name: job-data
      persistentVolumeClaim:
        claimName: job-data
    - name: custom-resources
      persistentVolumeClaim:
        claimName: custom-resources
EOF

The test pod uses runAsUser: 61000, runAsGroup: 61000, and fsGroup: 61000. These values match the Dynamic Engine runtime UID/GID and are required for correct file ownership on the mounted volumes.

Validate that the pod can read and write data to the mounted volumes:

kubectl exec -n $DYNAMIC_ENGINE_ENV_NAMESPACE test-pod-pvc -- \
  sh -c "echo 'Hello world' > /opt/talend/archive/test.txt \
  && cat /opt/talend/archive/test.txt \
  && rm -f /opt/talend/archive/test.txt"

If the command completes without error, the S3-backed volume is accessible and writable.

Results

Your EKS cluster is running with the S3 CSI driver installed, and the three PersistentVolumes (archive, job-data, custom-resources) are created and bound to their PersistentVolumeClaims in the qlik-processing-env-<env-id> namespace. The cluster is ready for Dynamic Engine deployment.

Tip: The embedded docker-registry service of Dynamic Engine performs random writes, which the S3 CSI driver (Mountpoint for Amazon S3) does not support. As a result, the docker-registry volume cannot use an S3-backed PersistentVolumeClaim. Use EFS or other external systems that support POSIX (Portable Operating System Interface) to provision persistent volumes, which can be dynamic or static, for docker-registry.

For details on how to configure docker-registry, see Configuring a custom Docker registry for Data Services and Routes.
For instructions on how to use existing static persistent volumes, see Deploying Dynamic Engine with existing PersistentVolumeClaims.
For instructions on how to use dynamic persistent volumes (StorageClass), see Provisioning a storage class dedicated to Dynamic Engine environment services.

What to do next

Configure and deploy the Dynamic Engine Helm charts to reference the PersistentVolumeClaims you created in this procedure. For detailed instructions on using existing PVCs, see Deploying Dynamic Engine with existing PersistentVolumeClaims.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here