Processing leads on Amazon S3 and loading them into MySQL
This scenario aims at helping you set up and use connectors in a pipeline. You are advised to adapt it to your environment and use case.
Before you begin
- If you want to reproduce this scenario, download and extract the file: s3_mysql-lead_campaign.zip .
Procedure
- Click Connections > Add connection.
-
In the panel that opens, select the type of connection you
want to create.
Example
S3 -
Select your engine
in the Engine list.
Information noteNote:
- It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
- If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
- The list of available connection types depends on the engine you have selected.
-
Select the type of connection you want to create.
Here, select S3 connection.
- Fill in the connection properties to access your S3 account as described in Amazon S3 properties, check the connection and click Add dataset.
- In the Add a new dataset panel, name your dataset lead generation campaign.
- Select S3 in the connection list.
-
Click Autodetect or manually fill in the required properties
to access the file located in your S3 bucket (CSV format, space field delimiter, no
header) and click View sample to see a preview of your dataset
sample.
- Click Validate to save your dataset.
- Do the same to add the MySQL connection and MySQL table dataset that will be used as destination in your pipeline. Fill in the connection properties as described in MySQL properties.
- Click Add pipeline on the Pipelines page. Your new pipeline opens.
-
Give the pipeline a meaningful name.
Example
From S3 to MySQL - Process leads - Click ADD SOURCE and select your source dataset, lead generation campaign in the panel that opens.
- Click and add a Field selector processor to the pipeline in order to select specific fields and give them a meaningful name. The configuration panel opens.
-
Give a meaningful name to the processor.
Example
select countries and revenues -
In the Simple view of the
Configuration tab, click the icon to open the Select fields window:
- Select .field2 and click the icon to rename it country, as you want to select the fields corresponding to customer countries.
-
Select .field7 and click the icon to rename it revenue, as you want to
select the fields corresponding to customer revenues.
- Click Save to save your configuration.
- Click and add a Filter processor to the pipeline in order to filter the records and keep only the customers who have provided their revenue during the marketing campaign. The configuration panel opens.
-
Give a meaningful name to the processor.
Example
remove empty revenues -
In the Filters area:
- Select .revenue in the Input list, as you want to process customer revenues.
- Select None in the Optionally select a function to apply list, as you do not want to apply a function while filtering records.
- Select != in the Operator list and type in N/A in the Value field as you want to filter on customers who provided their revenue.
- Click and add a Type Converter processor to the pipeline in order to convert the format of the revenue fields (string format). The configuration panel opens.
-
Give a meaningful name to the processor.
Example
convert revenue formats - In the Converters area, select .revenue in the Field path list and Double in the Output type list, as you want to convert the String type field holding revenue information to a Double type field.
- Click Save to save your configuration.
- Click and add an Aggregate processor to the pipeline. The configuration panel opens.
-
Give a meaningful name to the processor.
Example
count average revenue by country - In the Group by area, select the field you want to use for your aggregation set, here .country.
-
In the Operations area:
- Select .revenue in the Field path list and Average in the Operation list.
- Name the generated field (Output field name), average_revenue for example.
- Click Save to save your configuration.
-
(Optional) Look at the Aggregate processor to preview the
calculated data after the aggregating operation: the average revenue per
country.
- Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your output data (MySQL).
- Give a meaningful name to the Destination; load in MySQL table for example.
- Click Save to save your configuration.
- On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
- Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.
Results
Your pipeline is being executed, the lead information that was stored on S3 has been cleaned, the revenues are aggregated per country and the output flow is sent to the MySQL target table you have defined.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!