Looking for the old docs site? You can still view it for a limited time here.

Spark Administration in Kubernetes

In Fusion 5.0, Spark operates in native Kubernetes mode instead of standalone mode (like in Fusion 4.x). The sections below describe Spark operations in Fusion 5.0.

Cluster mode

Fusion 5.0 ships with Spark 2.4.3 and operates in "cluster" mode on top of Kubernetes. In cluster mode, each Spark driver runs in a separate pod and hence resources can be managed per job. Each executor also runs in its own pod.

Spark config defaults

The table below shows the default configurations for Spark. These settings are configured in the job-launcher config map, accessible using kubectl get configmaps <release-name>-job-launcher. Some of these settings are also configurable via Helm.

Table 1. Spark Resource Configurations

Spark Configuration

Default value

Helm Variable

spark.driver.memory

3g

spark.executor.instances

2

executorInstances

spark.executor.memory

3g

spark.executor.cores

6

spark.kubernetes.executor.request.cores

3

Table 2. Spark Kubernetes Configurations

Spark Configuration

Default value

Helm Variable

spark.kubernetes.container.image.pullPolicy

Always

image.imagePullPolicy

spark.kubernetes.container.image.pullSecrets

image.imagePullSecrets

spark.kubernetes.authenticate.driver.serviceAccountName

<name>-job-launcher-spark

spark.kubernetes.driver.container.image

fusion-dev-docker.ci-artifactory.lucidworks.com

image.repository

spark.kubernetes.executor.container.image

fusion-dev-docker.ci-artifactory.lucidworks.com

image.repository