Jupyter Support in Fusion

Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.

Starting with Fusion 5.0.2, we provide a Jupyter service that can be run from the Fusion Helm chart. See How to enable Jupyter.

Why Jupyter

Jupyter is evolving, and it has a number of kernels that support different programming languages. With Jupyter, you can run Spark in Scala or Python and run Fusion SQL at the same time in Python. This versatility gives you options to debug and try out various Fusion features. By integrating BeakerX, we get access to a wide variety of kernels and visualization features. And by configuring Jupyter, we can hide it behind the Fusion gateway (proxy) service and avoiding exposing the Jupyter IP externally.

What this service is for

  • Run/Debug Spark code in Scala/Python (Replacement for spark-shell)

  • Run SQL queries via Fusion SQL

  • Debug Scala and SQL transforms in PBL jobs

  • Everything else for which Jupyter is designed

What this service is not for

This service is not for running Spark jobs in a Kubernetes cluster.

The Jupyter pod does not have access to create or delete pods in a Kubernetes cluster, and therefore you cannot run jobs from Jupyter notebook in a Kubernetes cluster. However, Spark local mode can be used with a higher driver memory if needed.

We recommend sampling data for debugging Jupyter and then running the actual job using the Fusion jobs UI.

How to enable Jupyter

Jupyter can be enabled with the Fusion 5.0.2+ Helm chart. It is not enabled by default.

How to enable Jupyter
  1. Add the following to your custom values.yaml file:

    fusion-jupyter:
      enabled: true
  2. Verify that Jupyter is available at <fusion-ip>:6764/jupyter.

  3. Run the following:

    helm upgrade <release-name> <helm-repo>/fusion --values values.yaml  --version 5.0.2

    Be sure to specify your version of Fusion.

Configuration tips

  • The Jupyter pod IP is ClusterIP and not exposed externally. We recommend accessing it via the Fusion proxy and not exposing the pod IP via LoadBalancer.

  • Jupyter can also be accessed via port-forwarding and available via localhost:8888/jupyter

  • Idle notebooks and kernels are shutdown after 60 mins of inactivity.

  • This image is baked with Python 3.6+.

JupyterHub

Customers who run JupyterHub can use lucidworks/fusion-jupyter to launch notebooks with the Fusion Jupyter image.