Drop the following components from the Palette to the design workspace: tFileInputXML, tFileOutputXML
and tLogRow.
Right-click tFileInputXML and select
Row > Main in the contextual menu and then click tFileOutputXML to connect the components together.
Right-click tFileInputXML and select
Row > Reject in the contextual menu and then click tLogRow to connect the components together using a
reject link.
Double-click tFileInputXML to display the
Basic settings view and define the
component properties.
In the Property Type list, select Repository and click the [...] button next to the
field to display the Repository Content
dialog box where you can select the metadata relative to the input
file if you have already stored it in the File xml node under the Metadata folder of the Repository tree view. The fields that follow are automatically
filled with the fetched data. If not, select Built-in and fill in the fields that follow manually.
For more information about storing schema metadata in the
Repository tree view, see Talend Studio User
Guide.
In the Schema Type list, select Repository and click the [...] button to open the
dialog box where you can select the schema that describe the structure of the
input file if you have already stored it in the Repository tree view. If not, select Built-in and click the [...] button next to Edit schema to open a dialog box where you can define
the schema manually.
The schema in this example consists of five columns: id,
CustomerName, CustomerAddress, idState and
id2.
Click the [...] button next to the Filename field and browse to the XML file you want to
process.
In the Loop XPath query, enter between
inverted commas the path of the XML node on which to loop in order to retrieve
data.
In the Mapping table, Column is automatically populated with the defined
schema.
In the XPath query column, enter between
inverted commas the node of the XML file that holds the data you want to extract
from the corresponding column.
In the Limit field, enter the number of lines
to be processed, the first 10 lines in this example.
Double-click tFileOutputXML to display its
Basic settings view and define the
component properties.
Click the [...] button next to the File
Name field and browse to the output XML file you want to collect
data in, customer_data.xml in this example.
In the Row tag field, enter between inverted
commas the name you want to give to the tag that will hold the recuperated
data.
Click Edit schema to display the schema
dialog box and make sure that the schema matches that of the preceding
component. If not, click Sync columns to
retrieve the schema from the preceding component.
Double-click tLogRow to display its Basic settings view and define the component
properties.
Click Edit schema to open the schema dialog
box and make sure that the schema matches that of the preceding component. If
not, click Sync columns to retrieve the schema
of the preceding component.
In the Mode area, select the Vertical option.
Save your Job and press F6 to execute
it.
Results
The output file customer_data.xml holding the correct XML data is
created in the defined path and erroneous XML data is displayed on the console of the
Run view.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!