Skip to main content Skip to complementary content

Replicating a list of leads and processing the two output flows differently

A pipeline with a source, a Replicate processor, a Filter processor, and two destinations.

Before you begin

  • You have previously created a connection to the system storing your source data.

    Here, a database connection.

  • You have previously added the dataset holding your source data.

    Download and extract the file: filter-python-customers.zip. It contains lead data including ID, name, revenue, etc.

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a file stored on Amazon S3 and a file stored on HDFS.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Replicate and Process Leads
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a list of leads.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add a Replicate processor to the pipeline. The flow is duplicated and the configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    replicate leads
  7. Click the top ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data in the cloud (Amazon S3).
  8. Give a meaningful name to the Destination.

    Example

    store in cloud
  9. Click Plus next to the bottom ADD DESTINATION item on the pipeline and add a Filter processor.
  10. Give a meaningful name to the processor.

    Example

    filter on lead revenues
  11. In the Filters area:
    1. Select .Revenue in the Input list, as you want to filter leads based on this value.
    2. Select None in the Optionally select a function to apply list, as you do not want to apply a function while filtering records.
    3. Select >= in the Operator list and type in 70000 in the Value list as you want to filter on leads with a revenue superior to 70000 dollars.
  12. Click Save to save your configuration.
  13. (Optional) Look at the Filter processor preview to see your data after the filtering operation.

    Example

    Preview of the Filter processor after filtering on revenue records higher than 70000 dollars.
  14. Click the bottom ADD DESTINATION item on the pipeline to open the panel allowing to select the dataset that will hold your data on premises (HDFS) and give it a meaningful name.

    Example

    store on premises
  15. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  16. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the records are duplicated and filtered, and the output flows are sent to the target systems you have indicated.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!