Data Science Toolkit Integration

Beginning with Fusion 5.0, data scientists and machine learning engineers can deploy end-user-trained Python machine learning models to Fusion using the Data Science Toolkit Integration (DSTI). This offers real-time prediction and seamless integration with query and index pipelines.

Important

DSTI Component: Machine Learning service is deprecated in Fusion 5.0.x. Please upgrade to Fusion 5.1.2+ for long term support and benefits, including:

  • Improved stability and availability of models

  • Industry standard cloud native deployment processes using Seldon Core

Benefits:

  • Extension points for data scientists to plug in customized Python modeling code

  • Client libraries to ease the development and testing of Python plugins

  • API-driven and dynamic, runtime loading and updating of plugins

Example use cases:

  • Using SpaCy to extract named entities and indexing results into a Solr collection

  • Using a Keras model to perform query intent classification at query time

  • Using pre-trained word embeddings to generate synonyms for a query

DSTI components

  • Jupyter Notebook service: A fully-integrated Jupyter notebook in Fusion that allows for data scientists to explore data, Test SQL aggregations, and Run Fusion SQL statements. And import / export data to/from other storage mechanisms using Spark and choice of their language: Scala or Python. DSTI Component: Jupyter notebooks are still supported in Fusion 5.0.x.

  • Machine Learning service: Support model-serving in index pipelines and query pipelines.