In Fusion 5.x, Spark operates in native Kubernetes mode, rather than the standalone mode used in Fusion 4.x. This topic describes Spark operations in Fusion 5.x.

Node Selectors

You can control which nodes Spark executors are scheduled on using a Spark configuration property for a job:
spark.kubernetes.node.selector.<LABEL>=<LABEL_VALUE>
Use the LABEL specified for the node, and the name of the node as the LABEL_VALUE. For example, if a node is labeled with fusion_node_type=spark_only, schedule Spark executor pods to run on that node using:
spark.kubernetes.node.selector.fusion_node_type=spark_only
In Fusion 5.5, Spark version 2.4.x does not support tolerations for Spark pods. As a result, Spark pods can’t be scheduled on any nodes with taints in Fusion 5.5.

Cluster mode

Fusion 5 ships with Spark and operates in “cluster mode” on top of Kubernetes. In cluster mode, each Spark driver runs in a separate pod, and resources can be managed per job. Each executor also runs in its own pod.

Spark config defaults

The table below shows the default configurations for Spark. These settings are configured in the job-launcher config map, accessible using kubectl get configmaps <release-name>-job-launcher. Some of these settings are also configurable via Helm. Spark Resource Configurations
Spark ConfigurationDefault valueHelm Variable
spark.driver.memory3g
spark.executor.instances2executorInstances
spark.executor.memory3g
spark.executor.cores6
spark.kubernetes.executor.request.cores3
spark.sql.caseSensitivetrue
Spark Kubernetes Configurations
Spark ConfigurationDefault valueHelm Variable
spark.kubernetes.container.image.pullPolicyAlwaysimage.imagePullPolicy
spark.kubernetes.container.image.pullSecretsimage.imagePullSecrets
spark.kubernetes.authenticate.driver.serviceAccountName<name>-job-launcher-spark
spark.kubernetes.driver.container.imagefusion-dev-docker.ci-artifactory.lucidworks.comimage.repository
spark.kubernetes.executor.container.imagefusion-dev-docker.ci-artifactory.lucidworks.comimage.repository

Spark operations how-tos

These topics provide how-tos for Spark operations:
  • Configure Spark Job Resource Allocation
  • Configure Spark Jobs to Access Cloud Storage
  • Get Logs for a Spark Job
  • Clean Up Spark Driver Pods
  • Install the Spark History Server
  • Configure the Spark History Server
  • Access the Spark History Server