- Collection-specific Pipelines
- Pre-configured Pipelines
Index pipelines transform incoming data into PipelineDocument objects for indexing by Fusion’s Solr core. An index pipeline consists of a series of configurable index pipeline stages, each performing a different transformation on the data before passing the result to the next stage in the pipeline. The final stage is the Solr Indexer stage, which transforms the PipelineDocument into a Solr document and submits it to Solr for indexing in a specific Collection.
Alternatively, documents can be submitted directly to an Index Pipeline via the REST API; see Pushing Documents to a Pipeline.
A pipeline can be re-used across multiple collections. Fusion provides a set of built-in pipelines. You can use the Index Workbench or the REST API to develop custom index pipelines to suit any datasource or application.
When a Fusion collection is created using the Fusion UI, a pair of index and query pipelines are created to that pipeline, where the pipeline name is the collection name with the suffix "-default". This pipeline consists of a Field Mapper index stage
Although default pipelines are created when a Fusion collection is created, they are not deleted when the collection is deleted. This is due to the fact that pipelines can be used across collections, therefore a named pipeline, although originally associated with a collection, may be used by several collections.
Fusion includes several pre-configured pipelines which which provide out-of-the-box processing capabilities and/or a starting point for customization. There are also a set of named pipelines which are used by Fusion services for logging, signal processing, and signal aggregation.
General Purpose Pipelines
conn_solr - a pipeline used to parse and index documents. The initial stage is a Tika Parser index stage. The next stage is a Field Mapper index stage which has mapping rules for common document elements. The final stage is a Solr Indexer stage.
default - a pipeline which consists of just a Solr Indexer stage, used to push documents which have been completely parsed and have appropriately named fields to Solr for indexing.
conn_logging - a pipeline used debug parsing and field-mapping, stages, which doesn’t include a final Solr Indexer stage. The initial stage is a Logging Index Stage, followed by a Tika Parser index stage, then a second Logging Index Stage, followed by a Field Mapper index stage, then a final Logging Index Stage.
conn_noop - a pipeline used for testing datasource configurations which has no defined stages.
Internal Use Pipelines
_aggregation_default - a pipeline which consists of a single Solr Indexer stage which sends aggregations to Solr.
aggr_default - a pipeline which consists of a single Solr Indexer stage which sends aggregations to Solr.
aggr_rollup - also a pipeline which consists of a single Solr Indexer stage which sends aggregations to Solr.
signals_ingest - a pipeline used to index raw signal data. It has three stages, a Format Signals stage, a Field Mapping stage and a Solr Indexer stage to index the raw signal events.
system_metrics.pipeline - a pipeline which consists of a single Solr Indexer stage which sends internal information to the Fusion system_metrics collection.