Skip to main content Skip to complementary content

Procedure

Procedure

  1. Drop the following components from the Palette to the design workspace: tFileInputXML, tFileOutputXML and tLogRow.
    Right-click tFileInputXML and select Row > Main in the contextual menu and then click tFileOutputXML to connect the components together.
    Right-click tFileInputXML and select Row > Reject in the contextual menu and then click tLogRow to connect the components together using a reject link.
  2. Double-click tFileInputXML to display the Basic settings view and define the component properties.
  3. In the Property Type list, select Repository and click the [...] button next to the field to display the Repository Content dialog box where you can select the metadata relative to the input file if you have already stored it in the File xml node under the Metadata folder of the Repository tree view. The fields that follow are automatically filled with the fetched data. If not, select Built-in and fill in the fields that follow manually.
    For more information about storing schema metadata in the Repository tree view, see Talend Studio User Guide.
  4. In the Schema Type list, select Repository and click the [...] button to open the dialog box where you can select the schema that describe the structure of the input file if you have already stored it in the Repository tree view. If not, select Built-in and click the [...] button next to Edit schema to open a dialog box where you can define the schema manually.
    The schema in this example consists of five columns: id, CustomerName, CustomerAddress, idState and id2.
  5. Click the [...] button next to the Filename field and browse to the XML file you want to process.
  6. In the Loop XPath query, enter between inverted commas the path of the XML node on which to loop in order to retrieve data.
    In the Mapping table, Column is automatically populated with the defined schema.
    In the XPath query column, enter between inverted commas the node of the XML file that holds the data you want to extract from the corresponding column.
  7. In the Limit field, enter the number of lines to be processed, the first 10 lines in this example.
  8. Double-click tFileOutputXML to display its Basic settings view and define the component properties.
  9. Click the [...] button next to the File Name field and browse to the output XML file you want to collect data in, customer_data.xml in this example.
    In the Row tag field, enter between inverted commas the name you want to give to the tag that will hold the recuperated data.
    Click Edit schema to display the schema dialog box and make sure that the schema matches that of the preceding component. If not, click Sync columns to retrieve the schema from the preceding component.
  10. Double-click tLogRow to display its Basic settings view and define the component properties.
    Click Edit schema to open the schema dialog box and make sure that the schema matches that of the preceding component. If not, click Sync columns to retrieve the schema of the preceding component.
    In the Mode area, select the Vertical option.
  11. Save your Job and press F6 to execute it.

Results

The output file customer_data.xml holding the correct XML data is created in the defined path and erroneous XML data is displayed on the console of the Run view.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!