Extracting named entities from text data

In this Job, the tNLPPredict component predicts named entities and automatically labels text data, using a classification model generated by the tNLPModel component.

Procedure

Double-click the tNLPPredict component to open its Basic settings view and define its properties.
1. Click Sync columns to retrieve the schema from the previous component connected in the Job.
2. From the Original text column list, select the column that holds the text to be labeled, which is text in this example.
3. From the Token column list, select the column used for feature construction and prediction, which is tokens in this example
4. From the NLP Library list, select the same library you used for generating the model.
5. If the named entity recognition model is stored in a single file, select the Use the model file check box.
6. Specify the path to the model in the NLP model path.
Double-click the tFilterColumns component to open its Basic settings view and define its properties.
1. Click Sync columns to retrieve the schema from the previous component connected in the Job.
2. Set the Schema as Built-in and click Edit schema to keep only the columns that hold the original text, the labeled text and the labels.
Double-click the tFileOutputDelimited component to open its Basic settings view and define its properties.
1. Click Sync columns to retrieve the schema from the previous component connected in the Job.
2. Specify the path to the folder where you want to store the labeled text and the labels, in the Folder field.
3. Enter "\n" in the Row separator field and ";" in the Field separator field.
Press F6 to save and execute the Job.

Results

The output files contain the original text, the labeled text and the labels. The named entity recognition task was performed correctly, since person names were extracted from the original text.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here