- Collection-specific Pipelines
- Pre-configured Pipelines
- General Purpose Pipelines
- Legacy Pipelines
- Internal Use Pipelines
Index pipelines transform incoming data into PipelineDocument objects for indexing by Fusion’s Solr core. An index pipeline consists of a series of configurable index pipeline stages, each performing a different transformation on the data before passing the result to the next stage in the pipeline. The final stage is the Solr Indexer stage, which transforms the PipelineDocument into a Solr document and submits it to Solr for indexing in a specific Collection.
Alternatively, documents can be submitted directly to an Index Pipeline via the REST API; see Pushing Documents to a Pipeline.
A pipeline can be re-used across multiple collections. Fusion provides a set of built-in pipelines. You can use the Index Workbench or the REST API to develop custom index pipelines to suit any datasource or application.
When a Fusion collection is created using the Fusion UI, a pair of index and query pipelines are created to that pipeline, where the pipeline name is the collection name with the suffix "-default". This pipeline consists of a Field Mapping index stage
Although default pipelines are created when a Fusion collection is created, they are not deleted when the collection is deleted. This is due to the fact that pipelines can be used across collections, therefore a named pipeline, although originally associated with a collection, may be used by several collections.
Fusion includes several pre-configured pipelines which which provide out-of-the-box processing capabilities and/or a starting point for customization. There are also a set of named pipelines which are used by Fusion services for logging, signal processing, and signal aggregation.
General Purpose Pipelines
CSV - a pipeline for handling tabular data from CSV files, using these stages:
Default_Data - a pipeline for processing general key-value data, i.e., data which has already been parsed into key-value pairs.
Discard (Fusion 2.0) / conn_noop** - a pipeline used for testing datasource configurations which has no defined stages.
Documents_Parsing (Fusion 2.0) - a pipeline used to parse and index documents.
Documents_Parsing_debug_logging (Fusion 2.0) - this pipeline is an augmented version of the Documents_Parsing pipeline where a logging stage has been added before every processing stage.
JSON - a pipeline for handling JSON data.
Source_Code - a pipeline for extracting source code from Git and SVN repositories.
conn_solr - a pipeline used to parse and index documents. The initial stage is a Tika Parser index stage. The next stage is a Field Mapper index stage which has mapping rules for common document elements. The final stage is a Solr Indexer stage.
default - a pipeline which consists of just a Solr Indexer stage, used to push documents which have been completely parsed and have appropriately named fields to Solr for indexing.
Internal Use Pipelines
_aggregation_default - a pipeline which consists of a single Solr Indexer stage which sends aggregations to Solr.
_aggregation_rollup - also a pipeline which consists of a single Solr Indexer stage which sends aggregations to Solr.
_signals_ingest - a pipeline used to index raw signal data. It has three stages, a Format Signals stage, a Field Mapping stage and a Solr Indexer stage to index the raw signal events.
_system_metrics - a pipeline which consists of a single Solr Indexer stage which sends internal information to the Fusion system_metrics collection.