Jupyter Support in Fusion
Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.
|This feature is not available in Managed Fusion environments.|
Jupyter is evolving, and it has a number of kernels that support different programming languages. With Jupyter, you can run Spark in Scala or Python and run Fusion SQL at the same time in Python. This versatility gives you options to debug and try out various Fusion features. By integrating BeakerX, we get access to a wide variety of kernels and visualization features. And by configuring Jupyter, we can hide it behind the Fusion gateway (proxy) service and avoiding exposing the Jupyter IP externally.
Run/Debug Spark code in Scala/Python (Replacement for spark-shell)
Run SQL queries via Fusion SQL
Debug Scala and SQL transforms in PBL jobs
Everything else for which Jupyter is designed
This service is not for running Spark jobs in a Kubernetes cluster.
The Jupyter pod does not have access to create or delete pods in a Kubernetes cluster, and therefore you cannot run jobs from Jupyter notebook in a Kubernetes cluster. However, Spark local mode can be used with a higher driver memory if needed.
We recommend sampling data for debugging Jupyter and then running the actual job using the Fusion jobs UI.
The Jupyter pod IP is
ClusterIPand not exposed externally. We recommend accessing it via the Fusion proxy and not exposing the pod IP via
Jupyter can also be accessed via port-forwarding and available via
Idle notebooks and kernels are shutdown after 60 mins of inactivity. Note that any additional libraries installed via
pipwill need to be re-installed when the pod is recycled.
This image is baked with Python 3.6+.