Enable cloud signal storage
Storing signals in the cloud reduces the amount of data stored on a Solr cluster. Signals data files are periodically compacted into larger files to save storage space, improve performance, and make it easier to manage the files.
This article teaches you how to set up cloud signal storage in Google Cloud Platform (GCP) or Amazon Web Services (AWS).
For more information on cloud signal storage in Fusion, see Cloud signal storage.
There are known issues with using cloud signal storage. Before you begin, review the known issues and consider whether enabling cloud signal storage will negatively impact your Fusion environment. |
Initial setup
To enable cloud signal storage, start with a new deployment. Cloud signal storage is enabled in your custom values YAML file. Enable cloud signal storage by altering the fusion-indexing
service:
fusion-indexing:
cloudSignals:
enabled: true
kafkaSvcUrl: {KAFKA_URL}
topic: fusion.system.cloud-signals
Use the value {KAFKA_URL} , as written in the preceding example. This value is set by the Fusion deployment scripts.
|
After performing the initial setup, continue the setup specific to your cloud storage provider. Cloud signal storage is supported with Google Cloud Storage and Amazon S3.
Can I use multiple storage methods?
Fusion supports one storage location for signals data. You cannot choose to store signals in multiple places. |
Google Cloud Storage
-
Add the following values to your custom values YAML file:
cloud-signals: enabledStorage: - gcs gcs: outputDir: "OUTPUT_DIRECTORY" cloudSecret: SERVICE_ACCOUNT_SECRET compactor: intervalS: 7200 collectionExecutors: 1 partitionExecutors: 1 rowLimitPerFile: "1000000" resources: {} gcs: rootPath: COMPACTOR_ROOTPATH_DIRECTORY outputDir: COMPACTOR_OUTPUT_DIRECTORY
-
Replace the placeholder values with your environment values. See the following table for details:
Placeholder value Description OUTPUT_DIRECTORY
The location used for uncompacted signals data files. For example,
gs://smartdata-datasets/streaming/compactor/in
.SERVICE_ACCOUNT_SECRET
Your service account’s authentication secret. For more information on generating a service account secret, see Creating service account credentials.
COMPACTOR_ROOTPATH_DIRECTORY
The same value that you used for
OUTPUT_DIRECTORY
.COMPACTOR_OUTPUT_DIRECTORY
The location used for compacted signals data files. For example,
gs://smartdata-datasets/streaming/compactor/out
. -
Configure how often you want the compactor to run and the file size for compacted signals. See the following table for configuration options:
Option Example value Description intervalS
7200
The interval between compactor operations, in seconds.
rowLimitPerFile
1000000
Determines how many rows of data are written for each compacted signals data file.
-
Deploy Fusion using the custom values YAML file.
Amazon S3
-
Add the following values to your custom values YAML file:
cloud-signals: enabledStorage: - s3 s3: outputDir: "OUTPUT_DIRECTORY" cloudSecret: SERVICE_ACCOUNT_SECRET region: SERVICE_REGION compactor: intervalS: 7200 collectionExecutors: 1 partitionExecutors: 1 rowLimitPerFile: 1000000 resources: {} s3: secret: SERVICE_ACCOUNT_SECRET keyIdFieldName: key secretKeyFieldName: secret region: SERVICE_REGION rootPath: COMPACTOR_ROOTPATH_DIRECTORY outputDir: COMPACTOR_OUTPUT_DIRECTORY
-
Replace the placeholder values with your environment values. See the following table for details:
Placeholder value Description OUTPUT_DIRECTORY
The location used for uncompacted signals data files. For example,
s3a://smartdata-datasets/streaming/compactor/in
.SERVICE_ACCOUNT_SECRET
Your service account’s authentication secret. For more information on generating a service account secret, see AWS Secrets Manager.
SERVICE_REGION
Your S3 service region. For more information, see AWS service endpoints.
COMPACTOR_ROOTPATH_DIRECTORY
The value that you used for
OUTPUT_DIRECTORY
, with the prefix changed froms3a
tos3
. For example,s3://smartdata-datasets/streaming/compactor/in
.COMPACTOR_OUTPUT_DIRECTORY
The location used for compacted signals data files. For example,
s3://smartdata-datasets/streaming/compactor/out
. -
Configure how often you want the compactor to run and the file size for compacted signals. See the following table for configuration options:
Option Example value Description intervalS
7200
The interval between compactor operations, in seconds.
rowLimitPerFile
1000000
Determines how many rows of data are written for each compacted signals data file.
-
Deploy Fusion using the custom values YAML file.
Result
If you successfully enabled cloud signal storage, your deployment has two new pods:
Pod name | Description |
---|---|
|
The cloud signals consumer pod. If you are using Google Cloud Storage, the pod name contains |
|
The cloud signals compactor pod. If you are using Google Cloud Storage, the pod name contains |
Application setup
-
Create a new Fusion application and index some data.
-
Update the click signals aggregation job.
-
Navigate to Collections > Jobs and select the click signals aggregation job. By default, this job is named APP_NAME_click_signals_aggregation.
-
Change the Source Collection value to the path of your cloud storage location. For example:
gs://smartdata-datasets/streaming/compactor/in
.You might see a message that states, "This collection does not exist." This is expected with cloud signal collections. Ensure your path is correct and disregard this message. -
Change the Data Format value to
parquet
. -
Click Save to save the changes to the job.
-
-
Send signals to your application. For testing, you can send signals manually.
-
To send signals using the API:
curl -u USERNAME:PASSWORD -X POST \ --url 'https://FUSION_HOST.com/api/apps/APP_NAME/signals/APP_NAME?async=false&commit=true' \ --header 'Content-Type: application/json' \ --header 'cache-control: no-cache' \ --data '[ { "type": "SIGNAL_TYPE", "params": { "docId": "DOCUMENT_ID", "count": "NUMBER_OF_SIGNALS", "collection": "APP_NAME", "query": "*:*", "filterQueries": [] } } ]'
Replace placeholder values, such as
APP_NAME
, with your environment values. TheSIGNAL_TYPE
value can be any signal type, such asclick
,response
,cart
, or a custom signal type. -
To send signals in the Fusion UI from the Query Workbench:
-
Navigate to Querying > Query Workbench.
-
Click Format Results.
-
Enable the Send click signals option.
-
Click Save.
-
Click a result title in the query workbench to send a signal.
-
-
-
Verify your signals were captured in your cloud storage.
Why is my signals collection empty?
The signals collection contains signals located in a Solr cluster. If you’re using cloud signal storage, this collection will be empty. |