Creating a Job to divide the input text into tokens in CoNLL format
This Job uses tNLPPreprocessing to divide a text sample in XML
format into tokens. Then, tokens are converted to the CoNLL format using
tNormalize.
Procedure
- Drop the following components from the Palette onto the design workspace: tXMLFileInput, tNLPPreprocessing, tFilterColumns, tNormalize and tFileOutputDelimited.
- Connect the components using connections.