Machine learningIndex pipeline stage configuration specifications
The Managed Fusion machine learning indexing stage uses a trained machine learning model to analyze a field or fields of a PipelineDocument and stores the results of analysis in a new field of either the PipelineDocument or Context object.
In order to use the Machine Learning Stage, you must train a machine learning model. For more information on machine learning in Managed Fusion, see:
|
Lucidworks offers free training to help you get started with Fusion. Check out the Intro to Machine Learning in Fusion course, which focuses on using machine learning to to infer the goals of customers and users in order to deliver a more sophisticated search experience:
|
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|
Invokes a machine learning model to make a prediction on a document during indexing.
skip - boolean
Set to true to skip this stage.
Default: false
label - string
A unique label for this stage.
<= 255 characters
condition - string
Define a conditional script that must result in true or false. This can be used to determine if the stage should process or not.
modelId - stringrequired
Model ID
storeInContext - booleanrequired
Flag to indicate if the result should be stored in Context rather than in pipeline Document. If this is set, the Context Key field should be populated.
Default: false
contextKey - string
Name of context key to store prediction
failOnError - boolean
Flag to indicate if this stage should throw an exception if an error occurs while generating a prediction for a document.
Default: false
inputScript - stringrequired
Javascript code that returns a HashMap contains fields and values to send to ML model service. Refer to examples.
Default: /*
This script must construct a HashMap containing fields and values to be sent to the ML model service.
The field names and values will depend on the input schema of the model.
Generally, you'll be reading fields and values from the request/context/response and placing them into a HashMap.
Value types supported are:
- String
- Double
- String[]
- double[]
- List<String>
- List<Number>
This script receives these objects and can be referenced in your script:
- request
- response
- context
- log (Logger useful for debugging)
The last line of the script must be a reference to the HashMap object you created.
Example 1: Single pipeline doc's field value to modelInput HashMap
var modelInput = new java.util.HashMap()
modelInput.put("input_1", doc.getFirstFieldValue("my_field"))
modelInput
Example 2: List of strings from pipeline doc's field to modelInput HashMap
var modelInput = new java.util.HashMap()
modelInput.put("input_1", doc.getFieldValues("my_field")) // doc.getValues returns a Collection
modelInput
Example 3: List of numeric values from the pipeline doc's fields to modelInput HashMap
var modelInput = new java.util.HashMap()
var list = new java.util.ArrayList()
list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_1")))
list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_2")))
modelInput.put("input_1", list)
modelInput
Example 4: If you have created the model using Fusion ML Spark jobs, then use the following code
var modelInput = new java.util.HashMap()
modelInput.put("concatField", doc.getFieldValues("my_field"))
modelInput
*/
outputScript - string
Javascript code that receives output from ML service as a HashMap called "modelOutput". Most of the time this is used to place prediction results in the request or context. Refer to examples.
Default: /*
This output script receives the output prediction from the ML model service as a HashMap called "modelOutput".
Most of the time this is used to place prediction results in the request or context for downstream pipeline stages
to consume.
This script receives these objects and can be referenced in your script:
- modelOutput (a HashMap containing fields/values returned from ML model service)
- doc
- context
- log (Logger useful for debugging)
Example: Add predictedLabel (string) into pipeline doc as a field
doc.addField("sentiment", modelOutput.get("predictedLabel"))
*/
storePredictedFields - boolean
Store any predictions as predicted_[predicted_field] in the response.
Default: true