Jupyter Support in Fusion
Jupyter is an open source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
Fusion 5 provides a Jupyter service that can be run from the Fusion Helm chart. See How to enable Jupyter and How to use Jupyter with Fusion SQL.
This feature is not available in Managed Fusion environments. |
Why Jupyter
Jupyter is evolving, and it has a number of kernels that support different programming languages. With Jupyter, you can run Spark in Scala or Python and run Fusion SQL at the same time in Python. This versatility gives you options to debug and try out various Fusion features. By integrating BeakerX, we get access to a wide variety of kernels and visualization features. And by configuring Jupyter, we can hide it behind the Fusion gateway (proxy) service and avoiding exposing the Jupyter IP externally.
What this service is for
-
Run/Debug Spark code in Scala/Python (Replacement for spark-shell)
-
Run SQL queries via Fusion SQL
-
Debug Scala and SQL transforms in PBL jobs
-
Everything else for which Jupyter is designed
What this service is not for
This service is not for running Spark jobs in a Kubernetes cluster.
The Jupyter pod does not have access to create or delete pods in a Kubernetes cluster, and therefore you cannot run jobs from Jupyter notebook in a Kubernetes cluster. However, Spark local mode can be used with a higher driver memory if needed.
We recommend sampling data for debugging Jupyter and then running the actual job using the Fusion jobs UI.
Configuration tips
-
The Jupyter pod IP is
ClusterIP
and not exposed externally. We recommend accessing it via the Fusion proxy and not exposing the pod IP viaLoadBalancer
. -
Jupyter can also be accessed via port-forwarding and available via
localhost:8888/jupyter
-
Idle notebooks and kernels are shutdown after 60 mins of inactivity. Note that any additional libraries installed via
pip
will need to be re-installed when the pod is recycled. -
This image is baked with Python 3.6+.
JupyterHub
Customers who run JupyterHub can use lucidworks/fusion-jupyter
to launch notebooks with the Fusion Jupyter image.