Skip to main content Skip to complementary content

Amazon S3

Amazon S3 can be used as:

  • A cloud staging area when working with Databricks (optional) and Amazon Redshift (required) data pipeline platforms. The cloud staging area is where data and changes are staged before being applied and stored.
  • A target in a "Land data in data lake" replication task.

Permissions required for landing data

  • You must have an Amazon S3 bucket that is accessible from the Data Movement gateway machine.

    For information on signing up for Amazon S3, see http://aws.amazon.com/s3/.

  • Bucket access credentials: Make a note of the bucket name, access key and secret access key - you will need to provide them in the Amazon S3 connector settings.
  • Bucket access permissions: The following bucket access permissions are required:

     
    {
    	"Version": "2012-10-17",
    	"Statement": [
    	    {
    	     "Sid": "Stmt1497347821000",
    	     "Effect": "Allow",
    	     "Action": [
                    "s3:GetBucketLocation",
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::YOUR_BUCKET_NAME"
                ]
            },
            {
                "Sid": "Stmt1497344984000",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject"
                ],
                "Resource": [
                    "arn:aws:s3:::YOUR_BUCKET_NAME/target_path",
                    "arn:aws:s3:::YOUR_BUCKET_NAME/target_path/*"
                ]
            }
        ]
    }
    

Where YOUR_BUCKET_NAME is the name of your bucket and target_path is the intended location of the target files in your bucket.

Information note

If the target path is the bucket root, just specify “/target_path” with an empty string.

Qlik Data Gateway - Data Movement prerequisites

Data Movement gateway connects to AWS using SSL. This requires an appropriate CA certificate to reside on the Data Movement gateway machine; otherwise, the connection will fail. The purpose of the CA certificate is to authenticate the ownership of the AWS server certificate.

Make sure that the required CA certificate exists in the following location on the Linux machine:

/etc/pki/tls/certs/ca-bundle.crt

If it does not exist, the simplest solution is to copy the certificates bundle from another Linux machine.

Setting Amazon S3 connection properties

Data target

Data gateway: Select the Data Movement gateway that you want to use to access the target database.

Depending on your use case, this will either be the same Data Movement gateway deployed to land data from the data source, or a different one. For information about the possible Data Movement gateway deployment possibilities, see Common use cases.

Information noteRequires Data Movement gateway 2023.5.10 or later.

Connection properties

  • Access key: The access key for your Amazon S3 bucket.
  • Secret key: The secret key for your Amazon S3 bucket.
  • Bucket name: The name of your Amazon S3 bucket.

    Information note

    The default bucket region setting is auto-detect, which eliminates the need to set a specific region. However, due to security considerations, for some regions (for example, AWS GovCloud), you might need to explicitly set the region. In such a case, you can set the region code using the regionCode internal property.

    For a list of region codes, see the Region availability section in:https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html

    For instructions on setting internal properties, see below.

  • Use AWS PrivateLink: Select this to connect to an Amazon VPC and then specify the VPC Endpoint URL (for example, https://bucket.vpce-1a2b3c4d-5e6f.s3.us-east-1.vpce.amazonaws.com).

    Information noteThe Use AWS PrivateLink option is not supported when using an Amazon S3 bucket as the staging area for a Databricks target. For information on setting up connectivity to a Databricks target, see Databricks.

Internal properties

Internal properties are for special use cases and are therefore not exposed in the dialog. You should only use them if instructed by Qlik Support.

Use the Create new and Cancel buttons to the right of the fields to add or remove properties as needed.

Name

The display name for the connection.

Data type mapping

The following table shows the default mapping from Qlik Cloud data types to Amazon S3 data types.

Information noteThe data type mappings are only relevant if the Create metadata files in the target folder option in the "Land data in data lake" task settings is enabled.

Mapping from Qlik Cloud data types to Amazon S3

Qlik Cloud and Amazon S3 data types
Qlik Cloud data types Amazon S3 Target data types

DATE

DATE

TIME

TIME

DATETIME

DATETIME

BYTES

BYTES (length)

BLOB

BLOB

REAL4

REAL4 (7)

REAL8

REAL8 (14)

INT1

INT1 (3)

INT2

INT2 (5)

INT4

INT4 (10)

INT8

INT8 (19)

UINT1

UINT1 (3)

UINT2

UINT2 (5)

UINT4

UINT4 (10)

UINT8

UINT8 (20)

NUMERIC

NUMERIC (p,s)

STRING

STRING (Length)

WSTRING

STRING (Length)

CLOB

CLOB

NCLOB

NCLOB

BOOLEAN

BOOLEAN (1)

Mapping from Qlik Cloud data types to Parquet

When Parquet is set as the file format, due to the limited number of data types supported by Parquet, the data type mappings will be as follows:

Parquet data type mappings
Qlik Cloud Data Type Parquet Primitive Type Logical Type

BOOLEAN

BOOLEAN

 

INT1

INT32

INT(8, true)

INT2

INT32

INT(16, true)

INT4

INT32

 

INT8

INT64

 

UINT1

INT32

INT(8, false)

UINT2

INT32

INT(16, false)

UINT4

INT64

 

UINT8

INT64

INT(64, false)

REAL4

FLOAT

 

REAL8

DOUBLE

 

NUMERIC

FIXED_LEN_BYTE_ARRAY (16)

DECIMAL (precision, scale)

STRING

BYTE_ARRAY

STRING

WSTRING

BYTE_ARRAY

STRING

BYTES

BYTE_ARRAY

 

BLOB

BYTE_ARRAY

 

CLOB

BYTE_ARRAY

STRING

NCLOB

BYTE_ARRAY

STRING

DATE

INT32

DATE

TIME

INT32

TIME (UTC=true, unit=MILLIS)

DATETIME

INT64

TIMESTAMP (UTC=true, unit=MICROS)

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – let us know how we can improve!