Using the DSTI

This topic describes the high-level process of deploying trained Python models to Fusion with the Data Science Toolkit Integration (DSTI). Install the DSTI first.

Once you’re familiar with how to use the DSTI, you can optionally build a custom ML service image in order to use Python libraries that are not included in the default ML service image.

These are the high-level steps, explained in detail later in this topic:

How to use the DSTI

1. Train a custom model

Train a machine learning model using the virtual environment in any tool of your choice. This can be any kind of model, including Keras, TensorFlow, or scikit-learn.

After training your model, serialize the model and stateful preprocessing pipelines to files. Refer to the documentation of the modeling library you’re using for instructions on how to do this.

The requirements.txt included with this toolkit contains all the libraries that are available in Fusion’s Machine Learning Service’s runtime environment. This environment includes the most popular libraries used by data scientists today (that is, scikit-learn, Tensorflow, XGBoost, and so on). Restricting your training and inference code to these libraries will ensure that your model will work with Fusion without any additional setup steps.

If you need to use an additional library that is not available, you’ll need to create a custom Dockerfile which extends the ML service’s base image, and add the additional commands necessary to install additional pip dependencies. See Building a Custom ML Service Image.

2. Create and test the model plugin

To create a plugin, you will need to create a file called predict.py that contains these two functions:

  • def init(bundle_path: str)

    • This function is called by the ML service when the model is invoked for the first time. Place one-time initialization here, like loading a serialized model from disk

    • bundle_path is the path to the unzipped bundle.

  • def predict(model_input: dict) → dict

    • This function contains the code necessary to generate a prediction from a single input.

    • Single input parameter model_input is a dict representing the input to your model

    • Returns a dict of (key, value) pairs, representing model output. Dictionary keys must be str, an dictionary values must be one of the following types:

      • numbers.Number (float, int, and so on)

      • str

      • list or ndarray of str

      • list or ndarray of numbers.Number

Here’s a basic example of a model that simply outputs the input:

def init(bundle_path: str):
    """
    One-time initialization here. For example, loading a serialized model from disk.
    Any objects created here will need to be made module global in order for it to be
    accessible by predict(), i.e. global my_keras_model

    :param bundle_path: Path to unzipped bundle, used to construct file path to bundle
    contents, i.e. os.path.join(bundle_path, "model.pkl")
    """
    print("Initializing the model!")

def predict(model_input: dict) -> dict:
    """
    Generate prediction.

    Return value is a dict where keys must be `str`, and values must be one of the following types:
      - `numbers.Number` (`float`, `int`, etc.)
      - `str`
      - `list` or `ndarray` of `str`
      - `list` or `ndarray` of `numbers.Number`

    :param model_input: a dict containing model input
    :return: model output dict.
    """
    if 'input' not in model_input:
        raise ValueError("Input must contain the key 'input'")

    return {
        "output": model_input['input']
    }

Here’s an example of wrapping a simple sentiment analysis Keras model:

import os
import pickle
from keras.models import load_model
from keras import preprocessing
import keras

INPUT_LENGTH = 500

def init(bundle_path: str):
    """
    One-time initialization here. For example, loading a serialized model from disk.
    Any objects created here will need to be made module global in order for it to be
    accessible by predict(), i.e. global my_keras_model

    :param bundle_path: Path to unzipped bundle, used to construct file path to bundle
    contents, i.e. os.path.join(bundle_path, "model.pkl")
    """
    global tokenizer
    keras.backend.clear_session()
    with open(os.path.join(bundle_path, 'tokenizer.pickle'), 'rb') as f:
        tokenizer = pickle.load(f)
        global model
        model = load_model(os.path.join(bundle_path, 'sentiment.h5'))

def predict(model_input: dict) -> dict:
    """
    Generate prediction.

    Return value is a dict where keys must be `str`, and values must be one of the following types:
      - `numbers.Number` (`float`, `int`, etc.)
      - `str`
      - `list` or `ndarray` of `str`
      - `list` or `ndarray` of `numbers.Number`

    :param model_input: a dict containing model input
    :return: model output dict.
    """
    if 'input' not in model_input:
        raise ValueError("Required field 'input' not defined.")

    samples = [ model_input['input'] ]
    idx_sequence = tokenizer.texts_to_sequences(samples)
    padded_idx_sequence = preprocessing.sequence.pad_sequences(idx_sequence, maxlen=INPUT_LENGTH)

    y = model.predict(padded_idx_sequence)
    label = "positive" if y[0][0] > 0.5 else "negative"

    return {
        "sentiment": label,
        "score": y[0][0]
    }

3. Bundle the plugin

To create a model bundle, simply create a zip file containing predict.py and dependent serialized objects. For example, if your current working directly contains:

predict.py
tokenizer.pickle
sentiment.h5

On a Mac, you can create a zip using:

zip -r /path/to/model.zip .

4. Test the model plugin locally

The following requires the fusion-machine-learning-client library installed in your virtualenv.

The client libraries contain the LocalBundleRunner, which runs your plugin in your local interpreter. This allows you to quickly test and debug your plugin locally without needing to interact with Fusion.

Usage:

from lucidworks.ml.sdk import LocalBundleRunner

runner = LocalBundleRunner("/path/to/model.zip")
output = runner.predict({
	"input": "Hello World!"
})
  • class LocalBundleRunner(bundle_zip) - Loads the bundle zip file and invokes predict.py#init()

  • LocalBundleRunner.predict(model_input) - Calls predict.py#predict()`and returns model output `dict, ensuring that model output satisfies datatype requirements.

If your model produces expected output without any errors, you are ready to deploy your model to Fusion.

5. Deploy the model bundle to Fusion

Use MLServiceSDKFusionClient to deploy and test your model in Fusion. The model_id is simply a unique ID you assign to your model which you use to reference when interacting with the model.

from lucidworks.ml.sdk import MLServiceSDKFusionClient
from requests.auth import HTTPBasicAuth

model_id = 'echo'
app_name = '<Fusion App Name>'
fusion_api_url = '<Fusion API URL>'

client = MLServiceSDKFusionClient(fusion_api_url,
                                  app_name,
                                  auth=HTTPBasicAuth('<Username>', '<Password>'))

To upload your model to Fusion:

client.upload_model('/path/to/model.zip', model_id)

To test your uploaded model using Fusion:

output = client.predict({
	"input": "Hello World!"
})
Important
The MLServiceSDKFusionClient.predict() function in intended only for development and testing purposes, not for production use. To use your model in production, use in conjunction with query and index pipelines.

If your model produces expected output without any errors, congratulations, you have successfully deployed your model to Fusion! You can integrate your model with query and index pipelines using the Machine Learning stages.

Examples

Refer to these notebooks for examples on creating and deploying models in Fusion: