Like the NLP Annotator index stage, the NLP Annotator query stage can be included in an query pipeline to perform Natural Language Processing tasks.
You can choose from 3 different NLP implementations:
Set up and behavior differ depending on the implementation.
OpenNLP
The OpenNLP implementation is ready to use out-of-the-box. Simply specify "opennlp" as the "Model ID" property.
These annotation tasks are supported:
-
NER
-
Sentence detection
-
POS Tagging
-
Shallow Parsing (Chunking)
SpaCy
The SpaCy implementation is ready to use out-of-the-box. The default SpaCy implementation uses the en_core_web_sm model. Specify "spacy" as the "Model ID" property.
These annotation tasks are supported:
-
NER
-
Sentence detection
-
POS tagging
Schemes used for labels for each of the annotation tasks can be found at https://spacy.io/api/annotation.
Spark NLP
The Spark NLP implementation requires you first download a Spark NLP model and upload it to Fusion.
-
Download a model from https://nlp.johnsnowlabs.com/docs/en/pipelines. Note: Only the pre-trained NER model is supported. If choosing an NER model, download NerDLModel instead of NerCRFModel.
-
Upload the model to Fusion using the following curl command:
curl -u [username]:[password] \ -X POST \ "https://[fusion host]/api/ai/ml-models?modelId=[desired model ID]&type=spark-nlp" \ -F "file=@/path/to/model.zip"
For example, if you want to use the "Explain Document ML" model:
-
Download the latest version of the "Explain Document ML model" (
explain_document_ml_en_2.1.0_2.4_1563203154682.zip
at the time of this writing) -
Upload the model to Fusion:
curl -u [username]:[password] -X POST "https://[fusion host]/api/ai/ml-models?modelId=explain_document_ml&type=spark-nlp" -F "file=@/path/to/explain_document_ml_en_2.1.0_2.4_1563203154682.zip"`
-
When configuring this stage, specify "explain_document_ml" as the Model ID
For Spark NLP, the annotation tasks are supported depends on the model used.
-
Add the NLP Annotator query stage to the query pipeline.
-
Supply the Model ID ("opennlp", "spacy", or the model ID given to the uploaded Spark NLP model).
-
Specify the input parameter, label pattern and target parameter fields:
-
input parameter field: the Fusion query parameter text, normally
q
since we want to annotate the raw query string to understand the intent. -
label pattern: regex pattern that matches the NER/POS labels: for example,
PER.
will match extracted name entities with labelPERSON
, whileNN.
will match tagged nouns. -
target parameter field: the outcome extraction/tagging, and.
For the query stage, the result is set to be put in a new query parameter field:
-
Configuration
Tip
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|