Skip to main content Skip to complementary content

Processing strings about harvested crops

A pipeline with an S3 source, two Strings processors, and an S3 destination.

Before you begin

  • You have previously created a connection to the system storing your source data.

    Here, an Amazon S3 connection.

  • You have previously added the dataset holding your source data.

    Download the file: string-crops.csv. It contains a dataset with data about harvested crops in Mali with crop types, value of production, harvested areas, etc.

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a dataset stored in the same S3 bucket.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Process strings about harvested crops
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here data about harvested crops in Mali in the year 2005.

    Example

    Preview of a data sample with crop records.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add a Strings processor to the pipeline. The configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    change crop types to upper case
  7. In the Configuration area:
    1. Select Change to upper case in the Function name list.
    2. Select .crop_parent in the Fields to process list, as you want to change the crop type values to upper case.
  8. Click Save to save your configuration.

    Look at the preview of the processor to compare your data before and after the operation.

    Preview of the Strings processor after changing the case of crop records to upper case.
  9. Click Plus and add another Strings processor to the pipeline. The configuration panel opens.
  10. Give a meaningful name to the processor.

    Example

    match crop IDs with IDs
  11. In the Configuration area:
    1. Select Match similar text in the Function name list.
    2. Select .crop in the Fields to process list.
    3. Select Other column in the Use with list and .id in the Column list as you want to compare the crop name ID with the record ID.
    4. Enter 0 in the Fuzziness field as you want exact matches between the two field values.
  12. Click Save to save your configuration.

    Look at the preview of the processor to compare your data before and after the operation. You can see a new column crop_matches in which exact matches have a true values and IDs that do not match have a false value.

    Preview of the Strings processor after comparing crop records based on their IDs.
  13. Click ADD DESTINATION and select the dataset that will hold your processed data.
    Rename it if needed.
  14. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  15. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the strings selected have been processed and the output flow is sent to the S3 bucket you have indicated.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!