Product Selector

Fusion 5.12
    Fusion 5.12

    Enable cloud signal storage

    Storing signals in the cloud reduces the amount of data stored on a Solr cluster. Signals data files are periodically compacted into larger files to save storage space, improve performance, and make it easier to manage the files.

    This article teaches you how to set up cloud signal storage in Google Cloud Platform (GCP) or Amazon Web Services (AWS).

    For more information on cloud signal storage in Fusion, see Cloud signal storage.

    There are known issues with using cloud signal storage. Before you begin, review the known issues and consider whether enabling cloud signal storage will negatively impact your Fusion environment.

    Initial setup

    To enable cloud signal storage, start with a new deployment. Cloud signal storage is enabled in your custom values YAML file. Enable cloud signal storage by altering the fusion-indexing service:

    fusion-indexing:
      cloudSignals:
        enabled: true
        kafkaSvcUrl: {KAFKA_URL}
        topic: fusion.system.cloud-signals
    Use the value {KAFKA_URL}, as written in the preceding example. This value is set by the Fusion deployment scripts.

    After performing the initial setup, continue the setup specific to your cloud storage provider. Cloud signal storage is supported with Google Cloud Storage and Amazon S3.

    Can I use multiple storage methods?

    Fusion supports one storage location for signals data. You cannot choose to store signals in multiple places.

    Google Cloud Storage

    1. Add the following values to your custom values YAML file:

      cloud-signals:
        enabledStorage:
          - gcs
        gcs:
          outputDir: "OUTPUT_DIRECTORY"
          cloudSecret: SERVICE_ACCOUNT_SECRET
        compactor:
          intervalS: 7200
          collectionExecutors: 1
          partitionExecutors: 1
          rowLimitPerFile: "1000000"
          resources: {}
          gcs:
            rootPath: COMPACTOR_ROOTPATH_DIRECTORY
            outputDir: COMPACTOR_OUTPUT_DIRECTORY
    2. Replace the placeholder values with your environment values. See the following table for details:

      Placeholder value Description

      OUTPUT_DIRECTORY

      The location used for uncompacted signals data files. For example, gs://smartdata-datasets/streaming/compactor/in.

      SERVICE_ACCOUNT_SECRET

      Your service account’s authentication secret. For more information on generating a service account secret, see Creating service account credentials.

      COMPACTOR_ROOTPATH_DIRECTORY

      The same value that you used for OUTPUT_DIRECTORY.

      COMPACTOR_OUTPUT_DIRECTORY

      The location used for compacted signals data files. For example, gs://smartdata-datasets/streaming/compactor/out.

    3. Configure how often you want the compactor to run and the file size for compacted signals. See the following table for configuration options:

      Option Example value Description

      intervalS

      7200

      The interval between compactor operations, in seconds.

      rowLimitPerFile

      1000000

      Determines how many rows of data are written for each compacted signals data file.

    4. Deploy Fusion using the custom values YAML file.

    Amazon S3

    1. Add the following values to your custom values YAML file:

      cloud-signals:
        enabledStorage:
          - s3
        s3:
          outputDir: "OUTPUT_DIRECTORY"
          cloudSecret: SERVICE_ACCOUNT_SECRET
          region: SERVICE_REGION
        compactor:
          intervalS: 7200
          collectionExecutors: 1
          partitionExecutors: 1
          rowLimitPerFile: 1000000
          resources: {}
          s3:
            secret: SERVICE_ACCOUNT_SECRET
            keyIdFieldName: key
            secretKeyFieldName: secret
            region: SERVICE_REGION
            rootPath: COMPACTOR_ROOTPATH_DIRECTORY
            outputDir: COMPACTOR_OUTPUT_DIRECTORY
    2. Replace the placeholder values with your environment values. See the following table for details:

      Placeholder value Description

      OUTPUT_DIRECTORY

      The location used for uncompacted signals data files. For example, s3a://smartdata-datasets/streaming/compactor/in.

      SERVICE_ACCOUNT_SECRET

      Your service account’s authentication secret. For more information on generating a service account secret, see AWS Secrets Manager.

      SERVICE_REGION

      Your S3 service region. For more information, see AWS service endpoints.

      COMPACTOR_ROOTPATH_DIRECTORY

      The value that you used for OUTPUT_DIRECTORY, with the prefix changed from s3a to s3. For example, s3://smartdata-datasets/streaming/compactor/in.

      COMPACTOR_OUTPUT_DIRECTORY

      The location used for compacted signals data files. For example, s3://smartdata-datasets/streaming/compactor/out.

    3. Configure how often you want the compactor to run and the file size for compacted signals. See the following table for configuration options:

      Option Example value Description

      intervalS

      7200

      The interval between compactor operations, in seconds.

      rowLimitPerFile

      1000000

      Determines how many rows of data are written for each compacted signals data file.

    4. Deploy Fusion using the custom values YAML file.

    Result

    If you successfully enabled cloud signal storage, your deployment has two new pods:

    Pod name Description
    • NAMEPSPACE-cloud-signals-gcs-POD_ID

    • NAMEPSPACE-cloud-signals-s3-POD_ID

    The cloud signals consumer pod. If you are using Google Cloud Storage, the pod name contains gcs. If you are using Amazon S3, the pod name contains s3 instead.

    • NAMEPSPACE-compactor-gcs-POD_ID

    • NAMEPSPACE-compactor-s3-POD_ID

    The cloud signals compactor pod. If you are using Google Cloud Storage, the pod name contains gcs. If you are using Amazon S3, the pod name contains s3 instead.

    Application setup

    1. Create a new Fusion application and index some data.

    2. Update the click signals aggregation job.

      1. Navigate to Collections > Jobs and select the click signals aggregation job. By default, this job is named APP_NAME_click_signals_aggregation.

      2. Change the Source Collection value to the path of your cloud storage location. For example: gs://smartdata-datasets/streaming/compactor/in.

        You might see a message that states, "This collection does not exist." This is expected with cloud signal collections. Ensure your path is correct and disregard this message.
      3. Change the Data Format value to parquet.

      4. Click Save to save the changes to the job.

    3. Send signals to your application. For testing, you can send signals manually.

      1. To send signals using the API:

        curl -u USERNAME:PASSWORD -X POST \
            --url 'https://FUSION_HOST.com/api/apps/APP_NAME/signals/APP_NAME?async=false&commit=true' \
            --header 'Content-Type: application/json' \
            --header 'cache-control: no-cache' \
            --data '[
                {
                  "type": "SIGNAL_TYPE",
                  "params": {
                    "docId": "DOCUMENT_ID",
                    "count": "NUMBER_OF_SIGNALS",
                    "collection": "APP_NAME",
                    "query": "*:*",
                    "filterQueries": []
                  }
                }
                ]'

        Replace placeholder values, such as APP_NAME, with your environment values. The SIGNAL_TYPE value can be any signal type, such as click, response, cart, or a custom signal type.

      2. To send signals in the Fusion UI from the Query Workbench:

        1. Navigate to Querying > Query Workbench.

        2. Click Format Results.

        3. Enable the Send click signals option.

        4. Click Save.

        5. Click a result title in the query workbench to send a signal.

    4. Verify your signals were captured in your cloud storage.

    Why is my signals collection empty?

    The signals collection contains signals located in a Solr cluster. If you’re using cloud signal storage, this collection will be empty.