Develop and Deploy a Machine Learning Model

This topic describes the high-level process of deploying trained models to Fusion 5.1 and above using Seldon Core. We will work through a Python-based example model, but you can see the Seldon Core documentation for details on how to wrap models inR, Java, JavaScript, or Go. Seldon Core deploys your model as a Docker image in Kubernetes, which you can scale up or down like other Fusion services.

1. Install Seldon Core

Install the Seldon Core Python package using pip or another Python package manager (such as conda):

pip install seldon-core

There are no restrictions on other libraries or frameworks, as your environment will be wrapped inside a Docker container for deployment.

2. Create an example model: sentiment analysis with PyTorch

As an example of using Seldon Core with Fusion, we will create a simple sentiment analysis model using a transformer-based architecture with PyTorch and Huggingface’s transformers library, leveraging a pre-trained DistillBERT model. However, there are no restrictions on what you use for your models; Keras, TensorFlow, JAX, scikit-learn, or any other Python libraries are supported.

3. Create Inference Class

We use Seldon Core to create an inference class wrapper around models for deployment into Fusion. This requires a class with at least two methods, __init__() and predict(), which are used by Seldon Core when deploying the model and serving predictions.

  • __init__() is called by Seldon Core when the model’s Docker container begins to start. This is where you should initialize your model and any other associated details you may need for inference. At this time, we recommend including your model’s trained parameters directly into the Docker container rather than reaching out to external storage inside`__init__`.

  • predict() is executed whenever the model is called to give a prediction. It will receive two parameters: X, which will be a numpy array containing the input to the model, an iterable set of column names, and an optional Dict of metadata. In Fusion 5.1, only the first two are used. The method must return a nested list of values which must be either number types or strings.

Here’s the complete code for our sentiment analysis model’s wrapper class. Note that this inference class can be easily unit-tested with any Python testing framework and requires no Fusion-specific libraries.

from transformers import pipeline
from transformers.tokenization_auto import AutoTokenizer
from transformers.modeling_auto import AutoModelForSequenceClassification
import numpy as np

from typing import Any, List, Iterable

INPUT_COLUMN = "text"

class SentimentExampleModel:
  def __init__(self):

      model = AutoModelForSequenceClassification.from_pretrained("./parameters")
      tokenizer = AutoTokenizer.from_pretrained("./parameters")
      self.classifier = pipeline("sentiment-analysis", model, tokenizer=tokenizer)

  def predict(self, X: np.ndarray, names: Iterable[str]) -> List[Any]:

      model_input = dict(zip(names, X))

      if INPUT_COLUMN not in model_input:
          raise ValueError(f"Input must contain the {INPUT_COLUMN} column")

      classification = self.classifier(model_input[INPUT_COLUMN])

      return [[classification[0]["label"]], [classification[0]["score"]]]

  def class_names(self) -> Iterable[str]:
        return ["class", "score"]

The __init__() method loads saved parameters from a parameters directory (which were saved previously from a Jupyter notebook). You’ll find all but one in the supplied zip file. However, because the saved model parameters are rather large, please download them from here and place the pytorch_model.bin file in the parameters directory.

4. Create Model Image

Now that we have a class for our model’s inference, the next step is to create a Docker image to make it ready for deployment. While you can produce a Dockerfile to manually create an image for the model, we recommend using s2i (source-to-image) to build the image with the appropriate libraries and configuration for use with Seldon Core.

  1. Download s2i from its GitHub page or use a package manager.

    For example, macOS you can use brew install s2i to make it available on your system.

  2. Create a standard Python requirements.txt that contains a list of dependencies needed for your code to run.

    You can do this manually or use pip freeze:

    pip freeze > requirements.txt
  3. Create an .s2i/environment file in the build directory of your inference code.

    It should look like this:

    MODEL_NAME=CLASS_NAME
    API_TYPE=GRPC
    SERVICE_TYPE=MODEL
    PERSISTENCE=0

    For our example, the .s2i/environment file is:

    MODEL_NAME=SentimentExampleModel
    API_TYPE=GRPC
    SERVICE_TYPE=MODEL
    PERSISTENCE=0
    Note
    While Seldon Core does support models using REST communication, Fusion exclusively uses GRPC for higher throughput and lower latency. The SERVICE_TYPE and PERSISTENCE values are currently the only ones supported in Fusion. See the Seldon Core docs for further details.
  4. Build the model using s2i:

    s2i build . seldonio/seldon-core-s2i-python3:0.18 sentiment-example-model

    The seldonio/seldon-core-s2i-python3:0.18 part of the command tells s2i which build image to use as a basis for making the container, and the sentiment-example-model is the Docker tag for the created image.

    Once s2i finishes, you should be able to see the image in docker images under sentiment-example-model.

  5. Push the new image to either a private registry or Docker Hub.

    You can deploy your model from either a private registry or Docker Hub. Here’s how we push to Docker Hub:

    docker login
    docker tag sentiment-example-model [DOCKERHUB USERNAME]/sentiment-example-model
    docker push [DOCKERHUB USERNAME]/sentiment-example-model

5. Deploy To Fusion

Now that your model is tested and Dockerized, you are ready to deploy it within Fusion.

  1. In the Fusion UI, navigate to Collections > Jobs.

  2. Select New > Create Seldon Core Model Deployment.

    create seldon deployment job0

    The job configuration panel opens.

    create seldon deployment job1

  3. Configure the following parameters:

    • Job ID — a string used by the Fusion API to reference the job after its creation.

    • Model name — a name for the deployed model.

      This will be used to generate the deployment name in Seldon Core, and will also be the name that you reference as a model-id when making predictions with the ML Service.

    • Model replicas — How many load-balanced replicas of the model to deploy.

    • Docker Repository — the repository (public or private) where the Docker image is located.

      If using Docker Hub, simply fill in the Docker Hub username here.

    • Image name — the name of the image with an optional tag (if no tag is supplied, latest will be used).

    • Kubernetes secret — if using a private repository, supply the name of the Kubernetes secret that will be used for access.

    • Output columns — A list of column names that the model’s predict method will return.

  4. Click Run > Start to run the model deployment job.

Once the job reports success, you can reference your model name in the Machine Learning index pipeline stage and the Machine Learning query pipeline stage.

Tip
Since your model is deployed as a Docker image in Kubernetes, you can scale it up or down by configuring the number of replicas and the auto-scaling options.