Skip to main content Skip to complementary content

Filtering a list of customers based on their registration date and revenue

A complex pipeline including a source dataset, two Filter processors, and three destinations.

Before you begin

  • You have previously created a connection to the system storing your source data.

    Here, a connection to a database.

  • You have previously added the dataset holding your source data.

    Download and extract the file: filter-python-customers.zip. It contains a list of customers with a registration date field that you can find attached to this document.

  • You also have created the connection and the related dataset that will hold the processed data.

    Here the files are stored on HDFS.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Filter on Registration and Revenue
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a list of customers stored in a database.

    Example

    Preview of a data sample about customers.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add a Filter processor to the pipeline. The Configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    customers registered in 2000
  7. In the Filters area:
    1. Select .RegistrationDate in the Input list, as you want to filter customers based on this value.
    2. Select None in the Optionally select a function to apply list, as you do not want to apply a function while filtering records.
    3. Select Contains in the Operator list and type in 2000 in the Value list as you want to filter on customers whose registration date contains the year 2000.

      You can use the avpath syntax in this area.

  8. Click Save to save your configuration.
  9. Click Plus and add another Filter processor to the pipeline. The Configuration panel opens.
  10. Give a meaningful name to the processor.

    Example

    customers with revenue > 90000
  11. In the Filters area:
    1. Select .Revenue in the Input list, as you want to filter customers based on this value.
    2. Select None in the Optionally select a function to apply list, as you do not want to apply a function while filtering records.
    3. Select > in the Operator list and type in 90000 in the Value list as you want to filter on customers with a revenue superior to 90000.
  12. Click Save to save your configuration.
  13. Click the Doesn't match filter button next to the first Filter processor to add and select the dataset that will hold the data that does not match the filter criteria.
  14. Give a meaningful name to the Destination.

    Example

    other registration date
  15. Click the ADD DESTINATION item next to the second Filter processor and select the dataset that will hold the data that does not match the filter criteria.
    Rename it if needed.
  16. Click the Doesn't match filter button next to the second Filter processor and select the dataset that will hold your rejected data.
  17. Give a meaningful name to the Destination.

    Example

    other customers
  18. (Optional) Look at the last Filter processor preview to see the data after the filtering operation.
    Preview of the Filter processor after applying the filter operation.
  19. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  20. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the data is filtered according to the conditions you have stated and the output is sent to the target system you have indicated.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!