The NLP Annotator index stage performs Natural Language Processing tasks.
The NLP Annotator supports the following tasks:
If choosing JohnSnow Lab (recommended for large dataset processing):
NER (Name Entity Recognition)
Fusion AI uses the deep learning pre-trained NER model that JohnSnowLab provides. Currently, the pre-trained extraction model covers the following name entities:
This means that there are the only three types of entities Fusion will recognize from the source field.
POS(Part of Speech) Tagging
If choosing OpenNLP:
Shallow Parsing (Chunking)
Add NLP Annotator index stage.
Choose the annotator type (OpenNLP or SparkNLP).
Configure the index pipeline stage:
Specify the model to use (fill the box with
model id in the blob store).
Specify the source, label pattern, and target (destination) fields:
source field: the raw text with name entities to be extracted.
label pattern: regex pattern that matches the NER/POS labels: for example,
PER. will match extracted name entities with label
NN. will match tagged nouns.
target field: the outcome extraction/tagging and so on.
When entering configuration values in the UI, use unescaped characters, such as