How To
Documentation
    Learn More

      Spark Administration in Kubernetes

      In Fusion 5.0, Spark operates in native Kubernetes mode instead of standalone mode (like in Fusion 4.x). The sections below describe Spark operations in Fusion 5.0.

      Node Selectors

      You can control which nodes Spark executors are scheduled on using Spark configuration property for a job:

      spark.kubernetes.node.selector.<LABEL>=<LABEL_VALUE>

      For instance, if a node is labeled with fusion_node_type=spark_only, then you would scheduled Spark executor pods to run on that node using:

      spark.kubernetes.node.selector.fusion_node_type=spark_only
      Spark version 2.4.x does not support tolerations for Spark pods; consequently, Spark pods cannot be scheduled on any nodes with taints.

      Cluster mode

      Fusion 5.0 ships with Spark 2.4.3 and operates in "cluster" mode on top of Kubernetes. In cluster mode, each Spark driver runs in a separate pod and hence resources can be managed per job. Each executor also runs in its own pod.

      Spark config defaults

      The table below shows the default configurations for Spark. These settings are configured in the job-launcher config map, accessible using kubectl get configmaps <release-name>-job-launcher. Some of these settings are also configurable via Helm.

      Spark Resource Configurations

      Spark Configuration

      Default value

      Helm Variable

      spark.driver.memory

      3g

      spark.executor.instances

      2

      executorInstances

      spark.executor.memory

      3g

      spark.executor.cores

      6

      spark.kubernetes.executor.request.cores

      3

      Spark Kubernetes Configurations

      Spark Configuration

      Default value

      Helm Variable

      spark.kubernetes.container.image.pullPolicy

      Always

      image.imagePullPolicy

      spark.kubernetes.container.image.pullSecrets

      image.imagePullSecrets

      spark.kubernetes.authenticate.driver.serviceAccountName

      <name>-job-launcher-spark

      spark.kubernetes.driver.container.image

      fusion-dev-docker.ci-artifactory.lucidworks.com

      image.repository

      spark.kubernetes.executor.container.image

      fusion-dev-docker.ci-artifactory.lucidworks.com

      image.repository