Skip to main content Skip to complementary content

AWS - Integrating Talend Data Integration with S3 and Lambda

AWS S3 (Simple Storage Service) is the very popular storage service of Amazon Web Services. It is widely used by customers and Talend provides out-of-the box connectivity with S3. AWS Lambda is another service which lets you run code without provisioning or managing servers. This is called Serverless computing.

In this article, we will demonstrate how to integrate Talend Data Integration with AWS S3 and AWS Lambda. We will build an event-driven architecture where an end-user drops a file in S3, and S3 notifies a Lambda function which triggers the execution of a Talend Job to process the S3 file.

Architecture

  1. A CSV file is uploaded into an S3 bucket.
  2. S3 sends a notification by invoking a Lambda function.
  3. The Lambda function invokes the execution of a Talend Job through Talend Administration Center HTTP API (MetaServlet API).
  4. Talend Administration Center launches the Talend Job on a Talend JobServer.
  5. The Talend Job downloads the CSV file from S3, computes then uploads the result back to S3.

Assumptions

  1. Amazon Web Services (AWS):
    • You should be familiar with the AWS platform since this article does not take a deep dive into details regarding Administration and Management of AWS services. You can refer to the Amazon Web Services (AWS) - Getting Started to read on all the AWS functionalities that Talend provides.
    • You should also have full access to the main AWS services described in the Prerequisites section below.
  2. Talend

    You should be familiar with Installation and Management of Talend Data Integration.

  3. Eclipse

    You should be familiar with Eclipse and Java since we will be using AWS toolkit with Eclipse to develop a Lambda function.

Environment

This demonstration is based on AWS Cloud Platform and Talend Data Integration.

Prerequisites

  1. A valid AWS account with full access to the following services:
  2. Valid AWS access keys to programmatically access AWS services:

    Read the documentation at http://docs.aws.amazon.com/en_en/general/latest/gr/managing-aws-access-keys.html to know how to create/manage/use AWS access keys.

  3. AWS Toolkit for Eclipse

    Follow the online documentation at http://docs.aws.amazon.com/toolkit-for-eclipse/v1/user-guide/setup-install.html to install the AWS toolkit on your laptop. This will be used to develop the Lambda Java function.

  4. Talend Data Integration (Commercial Edition) - https://www.talend.com/products/data-integration
    • Talend Studio
    • Talend Administration Center
    • Talend JobServer

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!