Product

Fusion 5.6

Built-in SQL Aggregation Jobs Using Cloud Storage Buckets

Built-in SQL aggregation jobs can be set up to use source files in Cloud storage buckets.

This process can be used with the following data types and Cloud storage systems:

  • File formats such as .parquet and .orc files

  • Cloud storage systems such as Google Cloud Storage (GCS), Amazon Web Services (AWS), and Azure Kubernetes Service (AKS).

Configure Parameters

Google Cloud Storage (GCS)

  1. Create a Kubernetes secret with the necessary credentials. For more information about creating a secret containing the credentials JSON file, see Configuring credentials for Spark jobs.

  2. When the secret is successfully created, set the following parameters:

GENERAL PARAMETERS

Parameter Name

Example Value

Notes

SOURCE COLLECTION

gs://<path_to_data>/*.parquet

Value: URI path that contains the desired signal data files.

Example value returns all parquet files in the directory, where gs is used to access GCS data.

DATA FORMAT

parquet

Value: File type of input file.

Other value can be orc.

SPARK SETTINGS

Parameter Name

Example Value

Notes

spark.kubernetes.driver.secrets.{secret-name}

/mnt/gcp-secrets

Value: The {secret-name} obtained during configuration.

Example: example-serviceaccount-key.

spark.kubernetes.executor.secrets.{secret-name}

/mnt/gcp-secrets

Value: The {secret-name} obtained during configuration.

Example: example-serviceaccount-key.

spark.kubernetes.driverEnv.GOOGLE_APPLICATION_CREDENTIALS

/mnt/gcp-secrets/{secret-name}.json

Value: The name of the .json file used to create the {secret-name} obtained during configuration.

Example: example-serviceaccount-key.json.

spark.executorEnv.GOOGLE_APPLICATION_CREDENTIALS

/mnt/gcp-secrets/{secret-name}.json

Value: The name of the .json file used to create the {secret-name} obtained during configuration.

Example: example-serviceaccount-key.json.

spark.hadoop.google.cloud.auth.service.account.json.keyfile

/mnt/gcp-secrets/{secret-name}.json

Value: The name of the .json file used to create the {secret-name} obtained during configuration.

Example: example-serviceaccount-key.json.

Amazon Web Services (AWS)

  1. Create a Kubernetes secret with the necessary credentials. For more information about creating a secret containing the credentials JSON file, see Configuring credentials for Spark jobs.

  2. When the secret is successfully created, set the following parameters:

GENERAL PARAMETERS

Parameter Name

Example Value

Notes

SOURCE COLLECTION

s3a://<path_to_data>/*.parquet

Value: URI path that contains the desired signal data files.

Example value returns all parquet files in the directory, where s3a is used to access AWS data.

DATA FORMAT

parquet

Value: File type of input file.

Other value can be orc.

SPARK SETTINGS

Parameter Name

Example Value

Notes

spark.kubernetes.driver.secretKeyRef.AWS_ACCESS_KEY_ID

{aws-secret-key}

Value: The aws-secret:key obtained during configuration.

spark.kubernetes.driver.secretKeyRef.AWS_SECRET_ACCESS_KEY

{aws-secret-secret}

Value: The aws-secret:secret obtained during configuration.

spark.kubernetes.executor.secretKeyRef.AWS_ACCESS_KEY_ID

{aws-secret-key}

Value: The aws-secret:key obtained during configuration.

spark.kubernetes.executor.secretKeyRef.AWS_SECRET_ACCESS_KEY

{aws-secret-secret}

Value: The aws-secret:secret obtained during configuration.

Azure settings

GENERAL PARAMETERS

Parameter Name

Example Value

Notes

SOURCE COLLECTION

wasbs://<path_to_data>/*.parquet

Value: URI path that contains the desired signal data files.

Example value returns all parquet files in the directory, where wasbs is used to access Azure data.

DATA FORMAT

parquet

Value: File type of input file.

Other value can be orc.

SPARK SETTINGS

Parameter Name

Example Value

Notes

spark.hadoop.fs.wasbs.impl

org.apache.hadoop.fs.azure.NativeAzureFileSystem

Makes the system file available inside the Spark job.

spark.hadoop.fs.azure.account.key.{storage-account-name}.blob.core.windows.net

{access-key-value}

Obtain the values for {storage-account-name} and {access-key-value} from the Users Azure UI.