Beginning with Fusion 5.0, data scientists and machine learning engineers can deploy end-user-trained Python machine learning models to Fusion using the Data Science Toolkit Integration (DSTI). This offers real-time prediction and seamless integration with query and index pipelines.
Important
|
DSTI Component: Machine Learning service is deprecated in Fusion 5.0.x. Please upgrade to Fusion 5.1.2+ for long term support and benefits, including: * Improved stability and availability of models * Industry standard cloud native deployment processes using Seldon Core |
Benefits:
-
Extension points for data scientists to plug in customized Python modeling code
-
Client libraries to ease the development and testing of Python plugins
-
API-driven and dynamic, runtime loading and updating of plugins
Example use cases:
-
Using SpaCy to extract named entities and indexing results into a Solr collection
-
Using a Keras model to perform query intent classification at query time
-
Using pre-trained word embeddings to generate synonyms for a query
DSTI components
-
Jupyter Notebook service: A fully-integrated Jupyter notebook in Fusion that allows for data scientists to explore data, Test SQL aggregations, and Run Fusion SQL statements. And import / export data to/from other storage mechanisms using Spark and choice of their language: Scala or Python. DSTI Component: Jupyter notebooks are still supported in Fusion 5.0.x.
-
Machine Learning service: Support model-serving in index pipelines and query pipelines.