Drop the following components from the Palette to the design workspace: tFileInputDelimited, tExtractXMLField, tFileOutputDelimited and tLogRow.
Connect the first three components using Row
Main links.
Connect tExtractXMLField to tLogRow using a Row
Reject link.
Double-click tFileInputDelimited to open its
Basic settings view and define the
component properties.
Select Built-in in the Schema list and fill in the file metadata manually in the
corresponding fields.
Click the [...] button next to Edit
schema to display a dialog box where you can define the structure
of your data.
Click the plus button to add as many columns as needed to your data structure.
In this example, we have one column in the schema:
xmlStr.
Click OK to validate your changes and close
the dialog box.
Information noteNote:
If you have already stored the schema in the Metadata folder under File
delimited, select Repository
from the Schema list and click the
[...] button next to the field to display the Repository Content dialog box where you can select the
relevant schema from the list. Click Ok to
close the dialog box and have the fields automatically filled in with the
schema metadata.
In the File Name field, click the [...]
button and browse to the input delimited file you want to process,
CustomerDetails_Error in this example.
This delimited file holds a number of simple XML lines separated by double
carriage return.
Set the row and field separators used in the input file in the corresponding
fields, double carriage return for the first and nothing for the second in this
example.
If needed, set Header, Footer and Limit. None is used
in this example.
In the design workspace, double-click tExtractXMLField to display its Basic
settings view and define the component properties.
Click Sync columns to retrieve the schema
from the preceding component. You can click the [...] button next to
Edit schema to view/modify the
schema.
The Column field in the Mapping table will be automatically populated with the defined
schema.
In the Xml field list, select the column from
which you want to extract the XML data. In this example, the filed holding the
XML data is called xmlStr.
In the Loop XPath query field, enter the node
of the XML tree on which to loop to retrieve data.
In the design workspace, double-click tFileOutputDelimited to open its Basic
settings view and display the component properties.
In the File Name field, define or browse to
the output file you want to write the correct data in,
CustomerNames_right.csv in this example.
Click Sync columns to retrieve the schema of
the preceding component. You can click the [...] button next to Edit schema to view/modify the schema.
In the design workspace, double-click tLogRow
to display its Basic settings view and define
the component properties.
Click Sync Columns to retrieve the schema of
the preceding component. For more information on this component, see tLogRow.
Save your Job and press F6 to execute it.
Results
tExtractXMLField reads and extracts in the output
delimited file, CustomerNames_right, the client information for
which the XML structure is correct, and displays as well erroneous data on the console
of the Run view.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!