Google Cloud Storage
Google Cloud Storage is Google's unified object storage service for storing and accessing data on Google Cloud infrastructure. It offers high availability, global redundancy, and integrates with the broader Google Cloud ecosystem.
Qlik Talend Cloud uses a Google Cloud service account with read access to the target bucket to connect to Google Cloud Storage (GCS). The connector retrieves files from the specified bucket, automatically discovers schemas by sampling file contents, and performs incremental data replication based on file modification timestamps.
Preparing for authentication
To access your data, you need to authenticate the connection with your account credentials.
To set up your Google Cloud Storage account, you need:
- A Google Cloud Platform (GCP) project with the Cloud Storage API enabled.
- A Google Cloud Storage (GCS) bucket that contains the files to be replicated.
- A service account with read access to the bucket.
The recommended role is Storage Object Viewer (
roles/storage.objectViewer), which grants the requiredstorage.objects.getandstorage.objects.listpermissions. For more information, see Google Cloud Storage IAM roles documentation . - A service account JSON key file downloaded for the service account.
To create a service account and retrieve your credentials:
- Log into your Google Cloud account.
- Navigate to IAM & Admin > Service Accounts.
- Click Create Service Account.
- Enter a name and description for the service account, then click Create and Continue.
- Grant the service account the Storage Object Viewer role or a custom role with
storage.objects.getandstorage.objects.listpermissions. - Click Continue and Done.
- In your newly created service account, click the Actions menu.
- Navigate to Manage keys > Add key > Create new key.
- Select JSON, and click Create.
The JSON key file is downloaded directly to your machine. This file includes the
project_id,client_email, andprivate_keyfields required to establish the connection.You can download the key file only once. Be sure to store it securely and back it up, as it provides access to your Google Cloud resources.
Supported file formats
- Delimited text: CSV, TSV, PSV, TXT (with configurable delimiter)
- JSON Lines (
.jsonl) - Parquet (
.parquet) - Avro (
.avro) - Gzip-compressed files (
.gz) containing any of the above formats - ZIP archives containing CSV, JSON Lines, TXT, TSV, PSV, or Gzip files
Creating the connection
For more information, see Connecting to SaaS applications.
- Fill in the required connection properties.
-
Provide a name for the connection in Connection name.
-
Select Open connection metadata to define metadata for the connection when it has been created.
-
Click Create.
| Setting | Description |
|---|---|
| Data gateway |
Select a Data Movement gateway if required by your use case. Information note
This field is not available with the Qlik Talend Cloud Starter subscription, as it does not support Data Movement gateway. If you have another subscription tier and do not want to use Data Movement gateway, select None. For information on the benefits of Data Movement gateway and use cases that require it, see Qlik Data Gateway - Data Movement. |
| Start Date |
Enter the date, in the format |
| Client Email | Client email from the service account JSON key file. |
| Project ID | Project ID from the service account JSON key file. |
| Bucket | Name of the Google Cloud Storage (GCS) bucket where the files are stored, for example, my-gcs-bucket.
Do not include the |
| Tables | Configure tables to control which files are read and how their contents are interpreted. Each table definition includes a file search pattern, a table name, and optional settings for advanced behavior. |
| Private Key | Private key from the service account JSON key file. |
Table configuration
Each entry in the table configuration specifies a logical table created from files in the target bucket. You can configure the following properties for each table:
| Property | Required or Optional | Description |
|---|---|---|
| Table Name | Required |
Specify a name for the logical table, for example my_orders_csv. This name will appear as the stream name in Qlik Talend Cloud.
|
| Search Pattern | Required |
Enter a regular expression to match file names, for example .csv$ to select all CSV files.
|
| Search Prefix | Optional | Provide a path prefix within the bucket to narrow the file search, for example exports/orders/. Using a prefix improves performance by limiting the number of files scanned.
|
| Key Properties | Optional |
List one or more column names, separated by commas, to define the primary key. For example: id or id,date.
|
| Date Overrides | Optional | List column names, separated by commas, to be treated as date-time fields. Use this option if these fields are not automatically detected during schema discovery. |
| Delimiter | Optional |
Specify the character that separates values in your files. The default is , (comma). Use \t for tab-delimited (TSV) files or | for pipe-separated (PSV) files. If left blank, the system automatically detects the delimiter based on the file extension.
|
Tables replicated
Tables are created based on the table configuration (see above). Each table corresponds to a set of files in the Google Cloud Storage (GCS) bucket that match both the specified search pattern and any optional prefix. The connector automatically discovers schemas by sampling up to 5 files per table, reading every fifth row, with a maximum of 1,000 records per file.
Replication is incremental and uses file modification timestamps to track changes. During each extraction, the connector processes only those files that have been modified since the last successful sync, as recorded by the sync bookmark.
The following system columns are added to each table by default:
| Column | Description |
|---|---|
_sdc_source_bucket
|
The name of the Google Cloud Storage (GCS) bucket where the record was read. |
_sdc_source_file
|
The full path of the file containing the record. |
_sdc_source_lineno
|
The line number of the record within the file. |
_sdc_extra
|
Any extra columns found during parsing that do not match the discovered schema. Applies to JSONL files only. |
Limitations and considerations
-
Service account credentials (
project_id,client_email,private_key) must be provided as individual values extracted from the JSON key file; file upload is not supported. -
Gzip-compressed files (
.gz) are supported. The connector reads the original filename from the gzip header to determine the inner file format. Gzip files created with--no-name(no filename stored in the header) are skipped. -
Nested compression (for example, a
.gzinside another.gzor a.zipinside a.zip) is not supported. These files are skipped. -
Files with
.csv,.txt,.tsv,.psv, or.jsonlextensions are checked for gzip magic bytes and decompressed if gzip-compressed—even when the file does not have a.gzextension. -
The
search_patternfield uses regular expression syntax, not glob patterns. For example, use\.csv$instead of*.csv. -
The connector has built-in retry logic with exponential backoff for Google Cloud Storage (GCS) API rate limits (
429) and transient server errors (500,502,503,504). Up to five attempts are made before failing. - Files without a recognized extension are skipped and a warning is issued.