Product Selector

Fusion 5.12
    Fusion 5.12

    Machine learningIndex pipeline stage configuration specifications

    Table of Contents

    The Managed Fusion machine learning indexing stage uses a trained machine learning model to analyze a field or fields of a PipelineDocument and stores the results of analysis in a new field of either the PipelineDocument or Context object.

    In order to use the Machine Learning Stage, you must train a machine learning model. For more information on machine learning in Managed Fusion, see:

    Lucidworks offers free training to help you get started with Fusion. Check out the Intro to Machine Learning in Fusion course, which focuses on using machine learning to to infer the goals of customers and users in order to deliver a more sophisticated search experience:

    Intro to Machine Learning in Fusion

    Visit the LucidAcademy to see the full training catalog.

    Configuration

    When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.

    Invokes a machine learning model to make a prediction on a document during indexing.

    skip - boolean

    Set to true to skip this stage.

    Default: false

    label - string

    A unique label for this stage.

    <= 255 characters

    condition - string

    Define a conditional script that must result in true or false. This can be used to determine if the stage should process or not.

    modelId - stringrequired

    Model ID

    storeInContext - booleanrequired

    Flag to indicate if the result should be stored in Context rather than in pipeline Document. If this is set, the Context Key field should be populated.

    Default: false

    contextKey - string

    Name of context key to store prediction

    failOnError - boolean

    Flag to indicate if this stage should throw an exception if an error occurs while generating a prediction for a document.

    Default: false

    inputScript - stringrequired

    Javascript code that returns a HashMap contains fields and values to send to ML model service. Refer to examples.

    Default: /* This script must construct a HashMap containing fields and values to be sent to the ML model service. The field names and values will depend on the input schema of the model. Generally, you'll be reading fields and values from the request/context/response and placing them into a HashMap. Value types supported are: - String - Double - String[] - double[] - List<String> - List<Number> This script receives these objects and can be referenced in your script: - request - response - context - log (Logger useful for debugging) The last line of the script must be a reference to the HashMap object you created. Example 1: Single pipeline doc's field value to modelInput HashMap var modelInput = new java.util.HashMap() modelInput.put("input_1", doc.getFirstFieldValue("my_field")) modelInput Example 2: List of strings from pipeline doc's field to modelInput HashMap var modelInput = new java.util.HashMap() modelInput.put("input_1", doc.getFieldValues("my_field")) // doc.getValues returns a Collection modelInput Example 3: List of numeric values from the pipeline doc's fields to modelInput HashMap var modelInput = new java.util.HashMap() var list = new java.util.ArrayList() list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_1"))) list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_2"))) modelInput.put("input_1", list) modelInput Example 4: If you have created the model using Fusion ML Spark jobs, then use the following code var modelInput = new java.util.HashMap() modelInput.put("concatField", doc.getFieldValues("my_field")) modelInput */

    outputScript - string

    Javascript code that receives output from ML service as a HashMap called "modelOutput". Most of the time this is used to place prediction results in the request or context. Refer to examples.

    Default: /* This output script receives the output prediction from the ML model service as a HashMap called "modelOutput". Most of the time this is used to place prediction results in the request or context for downstream pipeline stages to consume. This script receives these objects and can be referenced in your script: - modelOutput (a HashMap containing fields/values returned from ML model service) - doc - context - log (Logger useful for debugging) Example: Add predictedLabel (string) into pipeline doc as a field doc.addField("sentiment", modelOutput.get("predictedLabel")) */

    storePredictedFields - boolean

    Store any predictions as predicted_[predicted_field] in the response.

    Default: true