Looking for the old site?

How To

Browse By

  • Objective

  • Products

  • User Role

    How To
    Documentation
      Learn More

        Spark Operations

        These topics provide how-tos for Spark operations:

        Node Selectors

        You can control which nodes Spark executors are scheduled on using Spark configuration property for a job:

        spark.kubernetes.node.selector.<LABEL>=<LABEL_VALUE>

        For instance, if a node is labeled with fusion_node_type=spark_only, then you would scheduled Spark executor pods to run on that node using:

        spark.kubernetes.node.selector.fusion_node_type=spark_only
        Spark version 2.4.x does not support tolerations for Spark pods; consequently, Spark pods cannot be scheduled on any nodes with taints.

        Cluster mode

        Fusion 5.0 ships with Spark 2.4.3 and operates in "cluster" mode on top of Kubernetes. In cluster mode, each Spark driver runs in a separate pod and hence resources can be managed per job. Each executor also runs in its own pod.

        Spark config defaults

        The table below shows the default configurations for Spark. These settings are configured in the job-launcher config map, accessible using kubectl get configmaps <release-name>-job-launcher. Some of these settings are also configurable via Helm.

        Spark Resource Configurations

        Spark Configuration

        Default value

        Helm Variable

        spark.driver.memory

        3g

        spark.executor.instances

        2

        executorInstances

        spark.executor.memory

        3g

        spark.executor.cores

        6

        spark.kubernetes.executor.request.cores

        3

        Spark Kubernetes Configurations

        Spark Configuration

        Default value

        Helm Variable

        spark.kubernetes.container.image.pullPolicy

        Always

        image.imagePullPolicy

        spark.kubernetes.container.image.pullSecrets

        image.imagePullSecrets

        spark.kubernetes.authenticate.driver.serviceAccountName

        <name>-job-launcher-spark

        spark.kubernetes.driver.container.image

        fusion-dev-docker.ci-artifactory.lucidworks.com

        image.repository

        spark.kubernetes.executor.container.image

        fusion-dev-docker.ci-artifactory.lucidworks.com

        image.repository