Configuring credentials in the Kubernetes cluster

AWS/GCS credentials can be configured per job or per cluster.

Configuring GCS credentials for Spark jobs

  1. Create a secret containing the credentials JSON file.

    See https://cloud.google.com/iam/docs/creating-managing-service-account-keys[How to create service account JSON files^].
    kubectl create secret generic solr-dev-gc-serviceaccount-key --from-file=/Users/kiranchitturi/creds/solr-dev-gc-serviceaccount-key.json
  2. Create an extra config map in Kubernetes setting the required properties for GCP.

    1. Create a properties file with GCP properties

      $ cat gcp-launcher.properties
      spark.kubernetes.driverEnv.GOOGLE_APPLICATION_CREDENTIALS = /mnt/gcp-secrets/solr-dev-gc-serviceaccount-key.json
      spark.kubernetes.driver.secrets.solr-dev-gc-serviceaccount-key = /mnt/gcp-secrets
      spark.kubernetes.executor.secrets.solr-dev-gc-serviceaccount-key = /mnt/gcp-secrets
      spark.executorEnv.GOOGLE_APPLICATION_CREDENTIALS = /mnt/gcp-secrets/solr-dev-gc-serviceaccount-key.json
      spark.hadoop.google.cloud.auth.service.account.json.keyfile = /mnt/gcp-secrets/solr-dev-gc-serviceaccount-key.json
    2. Create a config map based on the properties file

      kubectl create configmap gcp-launcher --from-file=gcp-launcher.properties
  3. Add the gcp-launcher config map to values.yaml under job-launcher.

    configSources: gcp-launcher

Configuring S3 credentials for Spark jobs

AWS credentials can’t be set via a single file. So, we have to set two environment variables referring to the key and secret.

  1. Create a secret pointing to the creds:

    kubectl create secret generic aws-secret --from-literal=key='<access key>' --from-literal=secret='<secret key>'
  2. Create an extra config map in Kubernetes setting the required properties for AWS:

    1. Create a properties file with AWS properties

      cat aws-launcher.properties
      spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secret:key
      spark.kubernetes.driver.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secret:secret
      spark.kubernetes.executor.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secret:key
      spark.kubernetes.executor.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secret:secret
    2. Create a config map based on the properties file:

      kubectl create configmap aws-launcher --from-file=aws-launcher.properties
  3. Add the aws-launcher config map to values.yaml under job-launcher:

    configSources: aws-launcher

Configuring Azure Data Lake credentials for Spark jobs

Configuring Azure through environment variables or configMaps does not seem to be possible at the moment. You need to manually upload the core-site.xml file into the job-launcher pod at /app/spark-dist/conf.

Currently only Data Lake Gen 1 is supported.

Here’s what the core-site.xml file should look like:

<property>
  <name>dfs.adls.oauth2.access.token.provider.type</name>
  <value>ClientCredential</value>
</property>
<property>
    <name>dfs.adls.oauth2.refresh.url</name>
    <value> Insert Your OAuth 2.0 Endpoint URL Value Here </value>
</property>
<property>
    <name>dfs.adls.oauth2.client.id</name>
    <value> Insert Your Application ID Here </value>
</property>
<property>
    <name>dfs.adls.oauth2.credential</name>
    <value>Insert the Secret Key Value Here </value>
</property>
<property>
    <name>fs.adl.impl</name>
    <value>org.apache.hadoop.fs.adl.AdlFileSystem</value>
</property>
<property>
    <name>fs.AbstractFileSystem.adl.impl</name>
    <value>org.apache.hadoop.fs.adl.Adl</value>
</property>

Configuring credentials per job

  1. Create a Kubernetes secret with the GCP/AWS credentials.

  2. Add the Spark configuration to configure the secrets for the Spark driver/executor.

GCS

  1. Create a secret containing the credentials JSON file.

    See https://cloud.google.com/iam/docs/creating-managing-service-account-keys[How to create service account JSON files^].
    kubectl create secret generic solr-dev-gc-serviceaccount-key --from-file=/Users/kiranchitturi/creds/solr-dev-gc-serviceaccount-key.json
  2. Toggle the Advanced config in the job UI and add the following to the Spark configuration:

    spark.kubernetes.driver.secrets.solr-dev-gc-serviceaccount-key = /mnt/gcp-secrets
    spark.kubernetes.executor.secrets.solr-dev-gc-serviceaccount-key = /mnt/gcp-secrets
    spark.kubernetes.driverEnv.GOOGLE_APPLICATION_CREDENTIALS = /mnt/gcp-secrets/solr-dev-gc-serviceaccount-key.json
    spark.executorEnv.GOOGLE_APPLICATION_CREDENTIALS = /mnt/gcp-secrets/solr-dev-gc-serviceaccount-key.json
    spark.hadoop.google.cloud.auth.service.account.json.keyfile = /mnt/gcp-secrets/solr-dev-gc-serviceaccount-key.json

S3

AWS credentials can’t be set via a single file. So, we have to set two environment variables referring to the key and secret.

  1. Create a secret pointing to the creds

    kubectl create secret generic aws-secret --from-literal=key='<access key>' --from-literal=secret='<secret key>'
  2. Toggle the Advanced config in the job UI and add the following to Spark configuration:

    spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secret:key
    spark.kubernetes.driver.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secret:secret
    spark.kubernetes.executor.secretKeyRef.AWS_ACCESS_KEY_ID=aws-secret:key
    spark.kubernetes.executor.secretKeyRef.AWS_SECRET_ACCESS_KEY=aws-secret:secret