Skip to main content Skip to complementary content

Using Amazon S3 with Talend Management Console

Talend Management Console is a secure cloud integration platform-as-a-service (iPaaS) that puts powerful graphical tools at your fingertips.

Amazon S3

Amazon Simple Storage Service (S3) is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of object data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize the benefits of scale and to pass those benefits on to developers.

Amazon S3 stores data as objects (files). It is not a database storage layer. Also note that Amazon Glacier leverages the Amazon S3 data storage infrastructure for archiving purposes.

Amazon S3 components in Talend Studio

Talend provides several components, as shown below, in the components palette which are built around the operations exposed by Amazon S3.

The S3 Amazon components are located in Cloud > Amazon > S3 in the components palette.

These operations, as can be seen in the screen capture, are:

  • Create an S3 Bucket
  • Delete an S3 Bucket
  • Check existence of an S3 Bucket
  • List all the S3 Buckets
  • Put files into S3 Bucket
  • Get files from S3 Bucket
  • List files in S3 Bucket
  • Delete files in S3 Bucket

These components are used within the Talend Management Console Tasks as described below.

Amazon S3 connection

Create connections in Talend Studio as follows. Right-click Context and click Create context.

Give aws_context as group and create three variables of type String: aws_access_key, aws_secret_key, and aws_bucket.

The String type variable appears in the Create context menu.

Use this context group in Talend Studio Jobs and Talend Management Console.

The Tasks leveraging these S3 native connection will be executed on the Talend Management Console Engines. Thus, the best way to access S3 from Talend AWS Infrastructure is through the use of Access Key and Secret Key. Please refer to the following article Managing Access Keys for your AWS Account for more information on Access Keys.

Amazon S3 files list

This Job returns a list of files stored on Amazon S3. It creates a connection to Amazon S3, gets a list of files, filters the list of files accordingly and then sets the file name for each file into the flow, as shown by the Job below:

Example on the design workspace of the Job created.

Context parameters

S3 Connection (as referred above):

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Bucket: Name of the source bucket where the file is stored.
  • Folder: Path to the source file to be listed.
  • File Type: Type of the files to be listed. To receive files from folder, use the symbol * as file type.

Output schema:

  • Name of the bucket where the file is stored.
  • Path to the file to be downloaded.
  • Content of the file to be downloaded.

Amazon S3 files upload

This Job uploads files to Amazon S3. The screen capture below shows the design:

Example on the design workspace of the Job created.

Context parameters

Connection:

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Name of the target bucket where the file is to be stored.
  • Path to the target file to be uploaded.

Amazon S3 file move

This component moves files on Amazon S3. To use it, you need to fill some parameters.

Example on the design workspace of the Job created.

Context parameters

Connection:

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Name of the source bucket where the file is stored.
  • Path to the source file to be copied.
  • Name of the target file.
  • Path to the target file.

Amazon S3 file download

This Job downloads files stored on Amazon S3 into the Cloud Engine temp directory. The file should later be processed by the Task and then removed from the temp directory.

Example on the design workspace of the Job created.

Context parameters

Connection:

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Name of the source bucket where the file is stored.
  • Path to the source file to be downloaded.

Output schema:

  • Name of the source bucket where the file is stored.
  • Path to the source file to be downloaded.
  • Content of the file to be downloaded.

Amazon S3 file delete

This Job deletes files stored on Amazon S3.

Example on the design workspace of the Job created.

Context parameters

Connection:

  • aws_access_key: Access key ID of the Amazon S3 account to be used.
  • aws_secret_key: Secret Access key of the Amazon S3 account to be used.

General:

  • Name of the source bucket where the file is stored.
  • Path to the source file to be deleted.

Publish and run on Cloud

  1. To publish these Jobs to Cloud, from Talend Studio right click on the Job and select Publish to Cloud.
    Highlight of the Publish to Cloud shortcut that appears on the Job context menu.
  2. Select the workspace for the Job to be published and click Finish.
    In the Publish to Cloud pop-up window, select your personal workspace.
  3. Once the Job is published to Cloud, a message with status will be displayed.
    Pop-up window confirming the success of the publication.
  4. Log in to Talend Management Console and verify the Task.
    Page of the task created.
  5. Expand the Advanced Parameters and validate the context values.
    The values used for the access key, the bucket, and the secret key display in the Advanced Parameters list
  6. Click Run Now and test the Task.
  7. Validate the Task logs by clicking View Logs.
    On the task page, inside the Last 5 runs tab, the Task log appears with a View Logs icon.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!