Lucidworks provides an Index Stage SDK in a public repository on GitHub with all the resources you need to develop custom index stages with Java.
Clone the repository to get started:
git clone https://github.com/lucidworks/index-stage-sdk
See Gradle quickstart documentation for more information on Java Projects.
Index stage configuration
The index stage configuration file defines configuration options specific to the index stage instance. The options defined in this configuration file are available to the user in the Fusion UI and the API. The plugin configuration class extends the index stage configuration file and is annotated with
@Property and type annotations to your stage configuration interface methods defines metadata and type requirements for your plugin configuration fields. This is similar to Fusion’s connector configuration schema.
See an example on the Index Stage SDK reference page.
The Index Stage SDK includes several APIs for communication with other Fusion components via the Fusion object. This object is passed to the stage during initialization.
A plugin is a
.zip file that contains one or more index stage implementations. The file contains
.jar files for stage definitions and additional dependencies. It also contains a manifest file that holds the metadata Fusion uses to run the plugin.
Plugins are uploaded to the Blob store:
Navigate to System > Blobs
Select Index Stage Plugin
Click Browse… and select your plugin file
Plugin stage classes must implement the
com.lucidworks.indexing.api.IndexStage interface and be annotated with
com.lucidworks.indexing.api.Stage annotation. For additional convenience, stage implementation can extend the
com.lucidworks.indexing.api.IndexStageBase class, which already contains initialization logic and some helpful methods.
See the Index Stage SDK reference page examples of plugin generation.
Creation and initialization
Fusion begins by creating an
IndexStage instance. After the index stage is created, it is initialized using the
init(T config, Fusion fusion) method. This allows for the creation of internal storage structions and the validation of the configuraiton.
Initialization occurs immediately after the stage configuration is saved in Fusion. The stage can be maintained and used by Fusion for extensive periods of time, even if no documents are being processed through the stage. This should be considered when making decisions on resource allocation.
Once the initalization process completes, Fusion calls the
process method for each document the index pipeline processes.
In most use cases, index stages process a single input document and emit a single output document. For these cases, the
process(Document document, Context context) method should be used.
In other cases, index stages process a single input document but emit multiple output documents. For these cases, the
process(Document document, Context context, Consumer<Document> output) method should be used. The output documents are sent by calling
A single stage instance can be used to process multiple documents, and the
process method can be called from multiple concurrently running threads. Additionally, Fusion can initialize and maintain multiple stage instances with the same configuration in separate indexing service nodes. Therefore, it’s important to ensure the plugin stage implementation is thread-safe and the processing logic is stateless.
|If the index stage throws an exception while processing a document, that document will not be processed further. It does not prevent other documents from being processed. Check the logs for information regarding the exception.|
The Index Stage SDK uses the SLF4J Reporter logging API. See an example on the Index Stage SDK reference page.