Natural Language Processing

This topic describes Fusion AI’s Natural Language Processing (NLP) features, available in the legacy OpenNLP NER Extraction index pipeline stage and the newer NLP Annotator index and query pipeline stages.

OpenNLP NER Extraction pipeline stage

The OpenNLP NER Extraction index pipeline stage performs only Named Entity Recognition (NER). This stage is available in all versions of Fusion AI.

For additional NLP functionality, use the NLP Annotator pipeline stages, available in Fusion AI versions 4.2 and up. See below for details.

NLP Annotator pipeline stages (4.2.0 and above)

Fusion AI 4.2 introduced the NLP Annotator as both an index pipeline stage and a query pipeline stage. The NLP Annotator performs a variety of fundamental NLP tasks:

If configured in an index pipeline, the NLP annotator performs selected NLP tasks on raw document content during the indexing process (see more details here). If configured in a query pipeline, the NLP annotator performs selected NLP tasks on the query text content (see more details here).

What can I do with Fusion AI’s NLP features?

Here are some real-world use cases for Fusion AI’s NLP features:

Named Entity Recognition (NER)

Named Entity Recognition is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under these predefined classes:

  • person

  • organization

  • location

For example:

NER

Name entity recognition is widely leveraged by today’s text mining projects. When organizations store large volumes of business documents in Fusion AI, the natural next step is to turn the large volume of text-centric data into some kind of knowledge base.

Take entity linking projects, for example: The client may want to link all relevant documents with an existing list of entities of interest. One way of doing this is to extract entities from all raw text documents, then perform fuzzy matching or another kind of text pattern matching to link relevant documents with a specific entity from the given list. This is more efficient than scanning the whole document and trying to search for the entity name. In this scenario, NER extraction is an ideal tool.

Fusion AI has integrated NER capability into its indexing and query pipelines to enable customers to perform knowledge discovery easily.

Sentence extraction

Part-of-Speech (POS) tagging

One of the most important roles of POS tagging is "word sense disambiguation". For instance, when searching for the word "present", if the intent is to look for the concept of gift, then having the word "present" tagged as a "noun" will help filter out content with "present" as a verb, representing an action of bringing before the public.

Shallow parsing (chunking)