Skip to main content Skip to complementary content

Normalizing complex records

A pipeline with an S3 dataset, a Normalize processor, and an S3 destination.

Before you begin

  • You have previously created a connection to the system storing your source data.

    Here, an Amazon S3 connection.

  • You have previously added the dataset holding your source data.

    Here, hierarchical data about actors including ID, name, country, etc.

  • You also have created the connection and the related dataset that will hold the processed data.

    Here, a file stored on Amazon S3.

Procedure

  1. Click Add pipeline on the Pipelines page. Your new pipeline opens.
  2. Give the pipeline a meaningful name.

    Example

    Normalize Actor Records
  3. Click ADD SOURCE to open the panel allowing you to select your source data, here a list of actors stored in HDFS.
  4. Select your dataset and click Select in order to add it to the pipeline.
    Rename it if needed.
  5. Click Plus and add a Normalize processor to the pipeline. The Configuration panel opens.
  6. Give a meaningful name to the processor.

    Example

    normalize actors structure
  7. In the Column to normalize fields, type in Actors, as this column contains the hierarchical records you want to normalize.
  8. Enable the Is list and Discard the trailing empty strings options to flatten the data (from an array structure to a record structure) in a list and discard empty ones.
  9. Click Save to save your configuration.
  10. Click ADD DESTINATION on the pipeline to open the panel allowing to select the dataset that will hold your normalized data.
    Rename it if needed.
  11. (Optional) Look at the preview of the Normalize processor to compare your data before and after the normalizing operation.
    Preview of the Normalize selector processor after flattening the actor records.
  12. On the top toolbar of Talend Cloud Pipeline Designer, click the Run button to open the panel allowing you to select your run profile.
  13. Select your run profile in the list (for more information, see Run profiles), then click Run to run your pipeline.

Results

Your pipeline is being executed, the records are normalized and the output is sent to the target system you have indicated.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!