Machine Learning Index Stage

The Fusion machine learning indexing stage uses a trained machine learning model to analyze a field or fields of a PipelineDocument and stores the results of analysis in a new field of either the PipelineDocument or Context object.

In order to use the Machine Learning Stage, you must train a machine learning model. There are two different ways to train a model:

This stage requires that you use JavaScript to construct a model input object from the PipelineDocument and/or Context. This JavaScript is defined in the "Model input transformation script" property. This script must construct a HashMap containing fields and values to be sent to the model. The field names and values will depend on the input schema of the model.

Value types supported are:

  • String

  • Double

  • String[]

  • double[]

  • List<String>

  • List<Number>

The JavaScript interpreter that executes the script will have the following variables available in scope:

The last line of the script must be a reference to the HashMap object you created.

Example 1: Single pipeline doc’s field value to modelInput HashMap
var modelInput = new java.util.HashMap()
modelInput.put("input_1", doc.getFirstFieldValue("my_field")) modelInput
Example 2: List of strings from pipeline doc’s field to modelInput HashMap
var modelInput = new java.util.HashMap()
modelInput.put("input_1", doc.getValues("my_field")) // doc.getValues returns a Collection
modelInput
Example 3: List of numeric values from the pipeline doc’s fields to modelInput HashMap
var modelInput = new java.util.HashMap()
var list = new java.util.ArrayList() list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_1"))) list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_2")))
modelInput.put("input_1", list)
modelInput

Similarly, you will need to use JavaScript to store the predictions into the PipelineDocument and/or Context from the model output object. The model output object is a HashMap containing fields and values produced by the model.

The JavaScript interpreter that executes the script will have the following variables available in scope:

  • modelOutput (a HashMap containing fields/values returned from ML model service)

  • doc

  • context

  • log (Logger useful for debugging)

Example: Add predictedLabel (string) into pipeline doc as a field
doc.addField("sentiment", modelOutput.get("predictedLabel"))

Configuration