Data Science Toolkit Integration

Beginning with Fusion 5.0, data scientists and machine learning engineers can deploy end-user-trained Python machine learning models to Fusion using the Data Science Toolkit Integration (DSTI). This offers real-time prediction and seamless integration with query and index pipelines.


  • Extension points for data scientists to plug in customized Python modeling code

  • Client libraries to ease the development and testing of Python plugins

  • API-driven and dynamic, runtime loading and updating of plugins

Example use cases:

  • Using SpaCy to extract named entities and indexing results into a Solr collection

  • Using a Keras model to perform query intent classification at query time

  • Using pre-trained word embeddings to generate synonyms for a query

DSTI components

  • Jupyter Notebook service: A fully-integrated Jupyter notebook in Fusion that allows for data scientists to explore data, Test SQL aggregations, and Run Fusion SQL statements. And import / export data to/from other storage mechanisms using Spark and choice of their language: Scala or Python. DSTI Component: Jupyter notebooks are still supported in Fusion 5.0.x.

  • Machine Learning service: Support model-serving in index pipelines and query pipelines.

  • Develop and Deploy a Machine Learning Model