Using Amazon S3 with Talend Management Console
Amazon S3
Amazon Simple Storage Service (S3) is storage for the Internet. It is designed to make web-scale computing easier for developers. Amazon S3 has a simple web services interface that you can use to store and retrieve any amount of object data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize the benefits of scale and to pass those benefits on to developers.
Amazon S3 stores data as objects (files). It is not a database storage layer. Also note that Amazon Glacier leverages the Amazon S3 data storage infrastructure for archiving purposes.
Amazon S3 components in Talend Studio
Talend provides several components, as shown below, in the components palette which are built around the operations exposed by Amazon S3.

These operations, as can be seen in the screen capture, are:
- Create an S3 Bucket
- Delete an S3 Bucket
- Check existence of an S3 Bucket
- List all the S3 Buckets
- Put files into S3 Bucket
- Get files from S3 Bucket
- List files in S3 Bucket
- Delete files in S3 Bucket
These components are used within the Talend Management Console Tasks as described below.
Amazon S3 connection
Create connections in Talend Studio as follows. Right-click Context and click Create context.
Give aws_context as group and create three variables of type String: aws_access_key, aws_secret_key, and aws_bucket.

Use this context group in Talend Studio Jobs and Talend Management Console.
The Tasks leveraging these S3 native connection will be executed on the Talend Management Console Engines. Thus, the best way to access S3 from Talend AWS Infrastructure is through the use of Access Key and Secret Key. Please refer to the following article Managing Access Keys for your AWS Account for more information on Access Keys.
Amazon S3 files list
This Job returns a list of files stored on Amazon S3. It creates a connection to Amazon S3, gets a list of files, filters the list of files accordingly and then sets the file name for each file into the flow, as shown by the Job below:

Context parameters
S3 Connection (as referred above):
- aws_access_key: Access key ID of the Amazon S3 account to be used.
- aws_secret_key: Secret Access key of the Amazon S3 account to be used.
General:
- Bucket: Name of the source bucket where the file is stored.
- Folder: Path to the source file to be listed.
- File Type: Type of the files to be listed. To receive files from folder, use the symbol * as file type.
Output schema:
- Name of the bucket where the file is stored.
- Path to the file to be downloaded.
- Content of the file to be downloaded.
Amazon S3 files upload
This Job uploads files to Amazon S3. The screen capture below shows the design:

Context parameters
Connection:
- aws_access_key: Access key ID of the Amazon S3 account to be used.
- aws_secret_key: Secret Access key of the Amazon S3 account to be used.
General:
- Name of the target bucket where the file is to be stored.
- Path to the target file to be uploaded.
Amazon S3 file move
This component moves files on Amazon S3. To use it, you need to fill some parameters.

Context parameters
Connection:
- aws_access_key: Access key ID of the Amazon S3 account to be used.
- aws_secret_key: Secret Access key of the Amazon S3 account to be used.
General:
- Name of the source bucket where the file is stored.
- Path to the source file to be copied.
- Name of the target file.
- Path to the target file.
Amazon S3 file download
This Job downloads files stored on Amazon S3 into the Cloud Engine temp directory. The file should later be processed by the Task and then removed from the temp directory.

Context parameters
Connection:
- aws_access_key: Access key ID of the Amazon S3 account to be used.
- aws_secret_key: Secret Access key of the Amazon S3 account to be used.
General:
- Name of the source bucket where the file is stored.
- Path to the source file to be downloaded.
Output schema:
- Name of the source bucket where the file is stored.
- Path to the source file to be downloaded.
- Content of the file to be downloaded.
Amazon S3 file delete
This Job deletes files stored on Amazon S3.

Context parameters
Connection:
- aws_access_key: Access key ID of the Amazon S3 account to be used.
- aws_secret_key: Secret Access key of the Amazon S3 account to be used.
General:
- Name of the source bucket where the file is stored.
- Path to the source file to be deleted.
Publish and run on Cloud
- To publish these Jobs to Cloud, from Talend Studio
right click on the Job and select Publish to Cloud.
- Select the workspace for the Job to be published and click
Finish.
- Once the Job is published to Cloud, a message with status will be displayed.
- Log in to Talend Management Console
and verify the Task.
- Expand the Advanced Parameters and validate the context values.
- Click Run Now and test the Task.
- Validate the Task logs by clicking View Logs.