Machine Learning Stage

The Machine Learning query pipeline stage uses a compiled machine learning model to analyze a field or fields of a Query Request object and stores the results of analysis in a new field added to either the Request or the PipelineContext object. You must use Spark’s MLlib API to create a supervised machine learning model and upload this model into Fusion’s blob store collection. Complete details are available in section: Machine Learning Models in Fusion

Successful use of this stage requires a proper understanding of both the model and your data. The machine learning model is described by its spark-mllib.json file, which contains the model specification as a JSON object. This object contains attribute "featureFields" which takes as its value a list of one of more field names. The contents of these fields are processed into the vector of features which the model operates on. If these fields aren’t present in the request, then the result is either an empty prediction or a configurable default value. If the contents of these fields differ greatly from the data used to compile the model, the predictions made by the model will be unreliable.