The Fusion machine learning indexing stage uses a trained machine learning model to analyze a field or fields of a PipelineDocument and stores the results of analysis in a new field of either the PipelineDocument or Context object.
In order to use the Machine Learning Stage, you must train a machine learning model. There are two different ways to train a model:
-
Use a Fusion AI job that trains a model, like Logistic Regression or Random Forest.
-
Train a native Python model and deploy it to Fusion using the Data Science Toolkit Integration (DSTI).
This stage requires that you use JavaScript to construct a model input object from the PipelineDocument and/or Context. This JavaScript is defined in the "Model input transformation script" property. This script must construct a HashMap containing fields and values to be sent to the model. The field names and values will depend on the input schema of the model.
Value types supported are:
-
String
-
Double
-
String[]
-
double[]
-
List<String>
-
List<Number>
The JavaScript interpreter that executes the script will have the following variables available in scope:
The last line of the script must be a reference to the HashMap object you created.
var modelInput = new java.util.HashMap()
modelInput.put("input_1", doc.getFirstFieldValue("my_field")) modelInput
var modelInput = new java.util.HashMap()
modelInput.put("input_1", doc.getValues("my_field")) // doc.getValues returns a Collection
modelInput
var modelInput = new java.util.HashMap()
var list = new java.util.ArrayList() list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_1"))) list.add(Double.parseDouble(doc.getFirstFieldValue("numeric_field_2")))
modelInput.put("input_1", list)
modelInput
Similarly, you will need to use JavaScript to store the predictions into the PipelineDocument and/or Context from the model output object. The model output object is a HashMap containing fields and values produced by the model.
The JavaScript interpreter that executes the script will have the following variables available in scope:
doc.addField("sentiment", modelOutput.get("predictedLabel"))
Note
|
Although this stage is available without a Fusion AI license, it is only effective after running the Fusion AI jobs mentioned above. |
Configuration
Tip
|
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.
|