- Document transformation
- Document filtering and enrichment
- Field transformation
- Natural language processing
Raw content is parsed into one or more PipelineDocument objects.
Any number of intermediate stages operate on the document fields directly, or, in the case of specialized NLP tools, add annotations to a document.
Finally, the PipelineDocument is sent to Solr for indexing.
A pipeline stage definition associates a unique ID with a set of properties. Pipeline definitions are stored in ZooKeeper for reuse across pipelines and search applications. The Fusion UI provides stage-specific panels used to define and configure each pipeline stage. Alternatively, JSON can be used to specify the sequence of pipeline stages and registered via the Fusion REST API. Some stages require additional resources, e.g., text files that contain lists of names, synonyms, places, or binary files which NLP language models. These resources can be uploaded via the Fusion UI or the REST API.
Available index pipeline stages are listed below: