Index Pipelines

Index pipelines transform incoming data into PipelineDocument objects for indexing by Fusion’s Solr core. An index pipeline consists of a series of configurable index pipeline stages, each performing a different transformation on the data before passing the result to the next stage in the pipeline. The final stage is the Solr Indexer stage, which transforms the PipelineDocument into a Solr document and submits it to Solr for indexing in a specific Collection.

Each configured datasource has an associated index pipeline and uses a connector to fetch data to parse and then input into the index pipeline.

ingest

Alternatively, documents can be submitted directly to an Index Pipeline via the REST API; see Pushing Documents to a Pipeline.

A pipeline can be re-used across multiple collections. Fusion provides a set of built-in pipelines. You can use the Index Workbench or the REST API to develop custom index pipelines to suit any datasource or application.

index pipeline

Collection-specific Pipelines

When a Fusion collection is created using the Fusion UI, a pair of index and query pipelines are created to that pipeline, where the pipeline name is the collection name with the suffix "-default". This pipeline consists of a Field Mapping index stage

Although default pipelines are created when a Fusion collection is created, they are not deleted when the collection is deleted. This is due to the fact that pipelines can be used across collections, therefore a named pipeline, although originally associated with a collection, may be used by several collections.

Pre-configured Pipelines

Fusion includes several pre-configured pipelines which which provide out-of-the-box processing capabilities and/or a starting point for customization. There are also a set of named pipelines which are used by Fusion services for logging, signal processing, and signal aggregation.

index pipeline stages

General Purpose Pipelines

Legacy Pipelines

  • conn_solr - a pipeline used to parse and index documents. The initial stage is a Tika Parser index stage. The next stage is a Field Mapper index stage which has mapping rules for common document elements. The final stage is a Solr Indexer stage.

  • default - a pipeline which consists of just a Solr Indexer stage, used to push documents which have been completely parsed and have appropriately named fields to Solr for indexing.

Internal Use Pipelines

  • _aggregation_default - a pipeline which consists of a single Solr Indexer stage which sends aggregations to Solr.

  • _aggregation_rollup - also a pipeline which consists of a single Solr Indexer stage which sends aggregations to Solr.

  • _signals_ingest - a pipeline used to index raw signal data. It has three stages, a Format Signals stage, a Field Mapping stage and a Solr Indexer stage to index the raw signal events.

  • _system_metrics - a pipeline which consists of a single Solr Indexer stage which sends internal information to the Fusion system_metrics collection.