Skip to main content

tNLPPreprocessing

Prepares a text sample and divides it into tokens, which can be words, numbers or punctuation marks.

tNLPPreprocessing outputs a column containing all the tokens for the input text, separated by tabs. You can convert the output to the CoNLL format and manually annotate the text. Then, you can use it to train a model and design features with the tNLPModel component.

This component can run only with Spark 1.6 and 2.0.

For more technologies supported by Talend, see Talend components.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!