Index Stage SDK

Overview

Lucidworks provides an Index Stage SDK in a public repository on GitHub with all the resources you need to develop custom index stages with Java.

Clone the repository to get started:

git clone https://github.com/lucidworks/index-stage-sdk

See Gradle quickstart documentation for more information on Java Projects.

Concepts

Index stage configuration

The index stage configuration file defines configuration options specific to the index stage instance. The options defined in this configuration file are available to the user in the Fusion UI and the API. The plugin configuration class extends the index stage configuration file and is annotated with @RootScheme.

Adding @Property and type annotations to your stage configuration interface methods defines metadata and type requirements for your plugin configuration fields. This is similar to Fusion’s connector configuration schema.

See an example on the Index Stage SDK reference page.

APIs

The Index Stage SDK includes several APIs for communication with other Fusion components via the Fusion object. This object is passed to the stage during initialization.

Plugins

A plugin is a .zip file that contains one or more index stage implementations. The file contains .jar files for stage definitions and additional dependencies. It also contains a manifest file that holds the metadata Fusion uses to run the plugin.

Plugins are uploaded to the Blob store:

  1. Navigate to System > Blobs

  2. Click Add

  3. Select Index Stage Plugin

    Index stage plugin

  4. Click Browse…​ and select your plugin file

  5. Click Upload

Plugin stage classes must implement the com.lucidworks.indexing.api.IndexStage interface and be annotated with com.lucidworks.indexing.api.Stage annotation. For additional convenience, stage implementation can extend the com.lucidworks.indexing.api.IndexStageBase class, which already contains initialization logic and some helpful methods.

See the Index Stage SDK reference page examples of plugin generation.

Lifecycle

Creation and initialization

Fusion begins by creating an IndexStage instance. After the index stage is created, it is initialized using the init(T config, Fusion fusion) method. This allows for the creation of internal storage structions and the validation of the configuraiton.

Initialization occurs immediately after the stage configuration is saved in Fusion. The stage can be maintained and used by Fusion for extensive periods of time, even if no documents are being processed through the stage. This should be considered when making decisions on resource allocation.

Document processing

Once the initalization process completes, Fusion calls the process method for each document the index pipeline processes.

In most use cases, index stages process a single input document and emit a single output document. For these cases, the process(Document document, Context context) method should be used.

In other cases, index stages process a single input document but emit multiple output documents. For these cases, the process(Document document, Context context, Consumer<Document> output) method should be used. The output documents are sent by calling output.accept(doc).

A single stage instance can be used to process multiple documents, and the process method can be called from multiple concurrently running threads. Additionally, Fusion can initialize and maintain multiple stage instances with the same configuration in separate indexing service nodes. Therefore, it’s important to ensure the plugin stage implementation is thread-safe and the processing logic is stateless.

Note
If the index stage throws an exception while processing a document, that document will not be processed further. It does not prevent other documents from being processed. Check the logs for information regarding the exception.

Logging

The Index Stage SDK uses the SLF4J Reporter logging API. See an example on the Index Stage SDK reference page.