Product Selector

Fusion 5.12
    Fusion 5.12

    Develop and Deploy a Machine Learning Model

    This article describes the high-level process to deploy models to Fusion 5.x.x releases using Seldon Core, and replaces the Source-to-Image (s2i) model deployment method. Seldon Core deploys your model as a Docker image in Kubernetes which you can scale up or down like other Fusion services.

    The procedure detailed in this topic deploys an OpenAI pre-trained, Python-based example model. OpenAI uses embeddings. What are embeddings?

    For information about how to wrap models in R, Java, JavaScript, or Go, see the Seldon Core documentation.

    Install Seldon Core

    Install the Seldon Core Python package using pip or another Python package manager, such as conda:

    pip install seldon-core

    There are no restrictions on other libraries or frameworks, as your environment is wrapped inside a Docker container for deployment.

    Create an example model: semantic vector search with OpenAI

    As an example of using Seldon Core with Fusion, we will create a simple embedding model using a REST API call to OpenAI’s API. However, there are no restrictions on what you use for your models; Keras, TensorFlow, JAX, scikit-learn, or any other Python libraries are supported.

    Create inference class

    Use Seldon Core to create an inference class wrapper around models for deployment into Fusion. This requires a class with at least two methods, __init__() and predict(), which are used by Seldon Core when deploying the model and serving predictions.

    The method __init__() is called by Seldon Core when the model’s Docker container begins to start. This is where you should initialize your model and any other associated details you may need for inference. At this time, it is recommended that you include your model’s trained parameters directly into the Docker container rather than reaching out to external storage inside __init__.

    The method predict() is executed whenever the model is called to give a prediction. It receives three parameters:

    Parameter Description


    A numpy array containing the input to the model.


    An iterable set of column names.


    An optional dictionary of metadata.

    In Fusion, only the first two parameters are used. Due to the way that Fusion sends input to the Seldon Core model, you should zip the X and names parameters together and then assign your inputs from this resulting Dict by referencing the keys you placed in the modelInput HashMap in the Machine Learning Stage. We also recommend raising a ValueError if a required key is not found in the input, as this will help with debugging.

    Here is the complete code for our sentiment analysis model’s wrapper class. Note that this inference class can be easily unit-tested with any Python testing framework and requires no Fusion-specific libraries.

    import logging
    import os
    import sys
    from typing import Any, List, Iterable
    import numpy as np
    import openai
    INPUT_COLUMN = "text"
    log = logging.getLogger()
    # NOTE!  Please add
    # export DOCKER_DEFAULT_PLATFORM=linux/amd64
    # to your ~/.zshrc
    # Otherwise, it may be build for the architecture you're currently on
    class OpenAIModel():
        def __init__(self):
  "env: %s", str(os.environ))
            openai.api_key = os.getenv("OPENAI_API_KEY", "api key not set")
        def get_embedding(self, text, engine="text-similarity-ada-001"):
            # replace newlines, which can negatively affect performance.
            text = text.replace("\n", " ")
            return openai.Embedding.create(input=[text], engine=engine)[
        def predict(self, X: np.ndarray, names: Iterable[str]) -> List[Any]:
  "in predict")
            model_input = dict(zip(names, X))
  "input fields: %s", model_input)
            engine = model_input.get("engine", "text-similarity-ada-001")
            # Initialize embedding so we know when try failed
            embedding = [-1]
            text = model_input["text"]
            if len(text) > 2000:
                log.warn("Input text too long, truncating to 2000 characters")
                text = text[0:2000]
                embedding = self.get_embedding(text, engine=engine)
            except Exception as e:  # work on python 3.x
      "Failed calling API: %", str(e))
            return [embedding]

    Create model image

    Now that we have a class for our model’s inference, the next step is to create a Docker image to make it ready for deployment. We recommend packaging a Python model to manually create an image for the model.

    Build image

    DOCKER_DEFAULT_PLATFORM=linux/amd64 docker build . -t [DOCKERHUB USERNAME]/fusion-seldon-openai:latest

    Alternatively, DOCKER_DEFAULT_PLATFORM may be exported from .zshrc.

    Push image

    You can deploy your model from either a private registry or Docker Hub. Here is how we push to Docker Hub:

    docker push [DOCKERHUB USERNAME]/fusion-seldon-openai:latest
    Replace the Dockerhub repo, version, and other relevant fields as needed. If using a private Dockerhub repo, you must obtain the secret and put it into Seldon Deployment Job.

    Deploy to Fusion

    Now that your model is tested and Dockerized, you are ready to deploy it within Fusion.

    1. In the Fusion UI, navigate to Collections > Jobs.

    2. Select Add > Create Seldon Core Model Deployment.

      create seldon deployment job0

    3. Configure the following parameters in the job configuration panel:

      Parameter Description

      Job ID

      A string used by the Fusion API to reference the job after its creation.

      Model name

      A name for the deployed model. This is used to generate the deployment name in Seldon Core. It is also the name that you reference as a model-id when making predictions with the ML Service.

      Model replicas

      The number of load-balanced replicas of the model to deploy.

      Docker Repository

      The public or private repository where the Docker image is located. If you’re using Docker Hub, fill in the Docker Hub username here.

      Image name

      The name of the image with an optional tag. If no tag is given, latest is used.

      Kubernetes secret

      If you’re using a private repository, supply the name of the Kubernetes secret used for access.

      Output columns

      A list of column names that the model’s predict method returns.

    4. Click Run > Start to run the model deployment job.

    Once the job reports success, you can reference your model name in the Machine Learning index pipeline stage.

    After deployment

    1. After deploying your model, create and modify an openai_sdep.yaml file. In the first line, kubectl get sdep gets the details for the currently running Seldon Deployment job and saves those details to a YAML file. kubectl apply -f open_sdep.yaml adds the OpenAI key to the Seldon Deployment job the next time it launches.

      kubectl get sdep openai -o yaml > openai_sdep.yaml
      # Modify openai_sdep.yaml to add
              - env:
                - name: OPENAI_API_KEY
                  value: "your-openai-api-key-here"
      kubectl apply -f openai_sdep.yaml
    2. Delete sdep before redeploying the model. The currently running Seldon Deployment job does not have the OpenAI key applied to it. Delete it before redeploying and the new job will have the key.

      kubectl delete sdep openai
    3. Lastly, you can encode into Milvus.



    Copy and paste the following into a file called requirements.txt:



    FROM python:3.7-slim
    WORKDIR /app
    # Install python packages
    COPY requirements.txt requirements.txt
    RUN pip install -r requirements.txt
    # Copy source code
    COPY . .
    # Port for GRPC
    EXPOSE 5000
    # Port for REST
    EXPOSE 9000
    # Define environment variables
    # Changing folder to default user
    RUN chown -R 8888 /app
    CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE


    /* globals Java, logger*/
    (function () {
        "use strict";
        var isDebug = false // turn off or on debug statements for this file's code
        //function logIfDebug(m){if(isDebug && m),;}
        return function main(request,response , ctx, collection, solrServer, solrServerFactory) {
          var vector = ctx.getOrDefault("Test_Vector",[]);
          //&q={!knn f=vector_v topK=10}[1.0, 2.0, 3.0, 4.0...]
          var q = "{!knn f=vector_v topK=10}" + JSON.stringify(vector);
          request.putSingleParam("q",q)   ;


    kind: SeldonDeployment
      annotations: |
      creationTimestamp: "2022-10-18T18:32:29Z"
      generation: 2
      name: openai
      namespace: yournamespace
      resourceVersion: "1485955230"
      uid: 8d79389d-be76-4a4d-89db-3233d2f12b72
      annotations: "true"
      name: openai
      - componentSpecs:
        - spec:
            - env:
              - name: OPENAI_API_KEY
                value: "your-openai-api-key-here"
              image: yourimage/fusion-seldon-openai:0.0.9
              imagePullPolicy: IfNotPresent
              name: openai
              resources: {}
              - mountPath: /etc/secrets
                name: my-secret
                readOnly: true
            - name: '{{MODEL_DOCKER_SECRET}}'
            nodeSelector: {}
            tolerations: []
            - name: my-secret
                - serviceAccountToken:
                    expirationSeconds: 3600
                    path: service-account-key
                - secret:
                    - key: sa
                      path: service-account-key
                    name: service-account-key
            type: GRPC
          name: openai
          type: MODEL
          version: v1666118120
        name: openai
        replicas: 1
        url: http://openai-openai.yourinstancehere.svc.cluster.local:8000/api/v1.0/predictions
      - lastTransitionTime: "2022-10-18T18:32:55Z"
        message: Deployment has minimum availability.
        reason: MinimumReplicasAvailable
        status: "True"
        type: DeploymentsReady
      - lastTransitionTime: "2022-10-18T18:32:30Z"
        reason: No HPAs defined
        status: "True"
        type: HpasReady
      - lastTransitionTime: "2022-10-18T18:32:30Z"
        reason: No KEDA resources defined
        status: "True"
        type: KedaReady
      - lastTransitionTime: "2022-10-18T18:32:30Z"
        reason: No PDBs defined
        status: "True"
        type: PdbsReady
      - lastTransitionTime: "2022-10-18T18:36:45Z"
        status: "True"
        type: Ready
      - lastTransitionTime: "2022-10-18T18:36:45Z"
        reason: All services created
        status: "True"
        type: ServicesReady
      - lastTransitionTime: "2022-10-18T18:32:55Z"
        reason: No VirtualServices defined
        status: "True"
        type: istioVirtualServicesReady
          availableReplicas: 1
          replicas: 1
      replicas: 1
      state: Available