Machine Learning Index Stage

The Fusion machine learning indexing stage uses a compiled machine learning model to analyze a field or fields of a PipelineDocument and stores the results of analysis in a new field of either the PipelineDocument or PipelineContext object. This stage was introduced in Fusion version 2.4.

You must use Spark’s MLlib API to create a supervised machine learning model and upload this model into Fusion’s blob store collection. Complete details are available in section: Machine Learning Models in Fusion.

Successful use of this stage requires a proper understanding of both the model and your data. The machine learning model is described by its spark-mllib.json file, which contains the model specification as a JSON object. This object contains attribute "featureFields" which takes as its value a list of one of more field names. The contents of these fields are processed into the vector of features which the model operates on. If these fields aren’t present in the document being analyzed, then the result is either an empty prediction or a configurable default value. If the contents of these fields differ greatly from the data used to compile the model, the predictions made by the model will be unreliable.

Configuration

Tip
When entering configuration values in the UI, use unescaped characters, such as \t for the tab character. When entering configuration values in the API, use escaped characters, such as \\t for the tab character.