Skip to main content Skip to complementary content

Processing and moving files located on an FTP server

This scenario aims at helping you set up and use connectors in a pipeline. You are advised to adapt it to your environment and use case.

Example of a pipeline created from the instructions below.

Procedure

  1. Click Connections > Add connection.
  2. In the panel that opens, select the type of connection you want to create.

    Example

    FTP
  3. Select your engine in the Engine list.
    Information noteNote:
    • It is recommended to use the Remote Engine Gen2 rather than the Cloud Engine for Design for advanced processing of data.
    • If no Remote Engine Gen2 has been created from Talend Management Console or if it exists but appears as unavailable which means it is not up and running, you will not be able to select a Connection type in the list nor to save the new connection.
    • The list of available connection types depends on the engine you have selected.
  4. Select the type of connection you want to create.
    Here, select FTP.
  5. Fill in the connection properties to access your FTP server as described in FTP properties, check the connection and click Add dataset.
  6. In the Add a new dataset panel, fill in the required properties to point to the FTP directory in which your file is located and click View sample to see a preview of your dataset sample.
    Configuration of a new FTP dataset.
    Here, the file to be retrieved is a CSV file listing restaurants in Baltimore located in a Talend/Files folder:
    CSV file to retrieve from the Talend/Files folder
  7. Click Validate to save your dataset.
  8. On the same FTP connection, add another dataset that will be used as destination in your pipeline. Here you are pointing to a Talend/Out folder.
    CSV dataset file to use as destination for your pipeline in the Talend/Out folder.
  9. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  10. Give the pipeline a meaningful name.

    Example

    Processing and moving files on FTP server
  11. Click ADD SOURCE and select your source dataset, restaurant on FTP dir in the panel that opens.
  12. Click add processor to add processors to the pipeline, for example an Aggregate processor to list all the restaurant addresses.
  13. Configure the processor. In the Operations area:
    1. Select .location in the Field path list.
    2. Select List in the Operation list.
    3. Enter the name of the Output field name, here address.
    4. Save your configuration.
    In the Output data preview, the 50 adresses became one list of adress.

    The restaurant addresses have been aggregated in one single record.

  14. Click add processor to add a Normalize processor to the pipeline in order to flatten the address record and split every entry into a separate record.
  15. Configure the processor. In the Operations area:
    1. Select .address in the Field path to normalize list.
    2. Enable the Is list option.
    3. Save your configuration.
    In the Output data preview, the horizontal list of adress that count as one record became a vertical list of adress that count as 50 records.
  16. Click the ADD DESTINATION item on the pipeline to open the panel allowing to select the FTP output directory in which your output file will be uploaded.
  17. Give a meaningful name to the destination; addresses on FTP out dir for example.
  18. In the Configuration tab of the destination, check that the file you want to upload does not exceed the size limit.
  19. Click Save to save your configuration.
  20. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  21. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the restaurant data that was stored on an FTP directory has been processed and the output file is uploaded to the FTP target directory you have specified:
  • The FTP target directory with the new uploaded file:

    CSV dataset file with the new uploaded file from the pipeline in the Talend/Out folder.
  • The CSV output file with the list of restaurant addresses:
    CSV file with 50 addresses listed one under the other.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!