Machine Learning Index Stage

The Fusion machine learning indexing stage uses a trained machine learning model to analyze a field or fields of a PipelineDocument and stores the results of analysis in a new field of either the PipelineDocument or Context object.

In order to use the Machine Learning Stage, you must train a machine learning model. There are two different ways to train a model:

This stage requires that you use JavaScript to construct a model input object from the PipelineDocument and/or Context. This JavaScript is defined in the "Model input transformation script" property. This script must construct a HashMap containing fields and values to be sent to the model. The field names and values will depend on the input schema of the model.

Value types supported are:

  • String

  • Double

  • String[]

  • double[]

  • List<String>

  • List<Number>

The JavaScript interpreter that executes the script will have the following variables available in scope:

The last line of the script must be a reference to the HashMap object you created.

Example 1: Single pipeline doc’s field value to modelInput HashMap
var modelInput = new java.util.HashMap()
modelInput.put("input_1", doc.getFirstFieldValue("my_field")) modelInput
Example 2: List of strings from pipeline doc’s field to modelInput HashMap
var modelInput = new java.util.HashMap()
modelInput.put("input_1", doc.getValues("my_field")) // doc.getValues returns a Collection
modelInput
Example 3: List of numeric values from the pipeline doc’s fields to modelInput HashMap
var modelInput = new java.util.HashMap()
var list = new java.util.ArrayList() list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_1"))) list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_2")))
modelInput.put("input_1", list)
modelInput

Similarly, you will need to use JavaScript to store the predictions into the PipelineDocument and/or Context from the model output object. The model output object is a HashMap containing fields and values produced by the model.

The JavaScript interpreter that executes the script will have the following variables available in scope:

  • modelOutput (a HashMap containing fields/values returned from ML model service)

  • doc

  • context

  • log (Logger useful for debugging)

Example: Add predictedLabel (string) into pipeline doc as a field
doc.addField("sentiment", modelOutput.get("predictedLabel"))
Note
Although this stage is available without a Fusion AI license, it is only effective after running the Fusion AI jobs mentioned above.

Configuration

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.