Fusion

Version 5.1
How To
Documentation
    Learn More

      Data Science Toolkit Integration

      Table of Contents

      Beginning with Fusion 5.0, data scientists and machine learning engineers can deploy end-user-trained Python machine learning models to Fusion using the Data Science Toolkit Integration (DSTI). This offers real-time prediction and seamless integration with query and index pipelines.

      Benefits:

      • Extension points for data scientists to plug in customized Python modeling code

      • Client libraries to ease the development and testing of Python plugins

      • API-driven and dynamic, runtime loading and updating of plugins

      Example use cases:

      • Using SpaCy to extract named entities and indexing results into a Solr collection

      • Using a Keras model to perform query intent classification at query time

      • Using pre-trained word embeddings to generate synonyms for a query

      DSTI components

      • Jupyter Notebook service: A fully-integrated Jupyter notebook in Fusion that allows for data scientists to explore data, Test SQL aggregations, and Run Fusion SQL statements. And import / export data to/from other storage mechanisms using Spark and choice of their language: Scala or Python. DSTI Component: Jupyter notebooks are still supported in Fusion 5.0.x.

      • Machine Learning service: Support model-serving in index pipelines.

      • Develop and Deploy a Machine Learning Model