Aggregations

Signals are most useful when they are aggregated into a set of summaries that can be used to enrich the search experience through recommendations and boosting.

Note
As of Fusion 3.1, the Signals Aggregator API is deprecated in favor of the Jobs API. This changes the API endpoint from /aggregator to /jobs. Aggregation jobs are a subtype of Spark jobs.

When signals are enabled for a "primary" collection, a <primarycollectionname>_signals collection and a <primarycollectionname>_signals_aggr collection are created automatically.

You can find the _signals collection by navigating to Devops > Home Home > Collections and expanding your original collection to display its system collections.

Aggregation Pipelines

Aggregated events are indexed, and use a default pipeline named "aggr_rollup". This pipeline contains one stage, a Solr Indexer stage to index the aggregated events.

You can create your own custom index pipeline to process aggregated events differently if you choose.

Aggregation Functions

The section Aggregator Functions documents the available set of aggregation functions.

Custom aggregation functions can be defined via a JavaScript stage. The options described in Aggregator Scripting provide more detail on the objects available for scripts.

Aggregation properties

The aggregation process is specified by an aggregation type consisting of the following list of properties:

Name Description

id

Aggregation ID

groupingFields

List of signal field names

signalTypes

List of signal types

aggregator

Symbolic name of the aggregator implementation

selectQuery

Query string, default *:*

sort

Ordering of aggregated signals

timeRange

String specifying time range, e.g., [* TO NOW]

outputPipeline

Pipeline ID for processing aggregated events

outputCollection

Output collection name

rollupPipeline

Rollup pipeline ID

rollupAggregator

Name of the aggregator implementation used for rollups

sourceRemove

Boolean, default is false

sourceCatchup

Boolean, default is true

outputRollup

Boolean, default is true

aggregates

List of aggregation functions

params

Arbitrary parameters to be used by specific aggregator implementations

Aggregation job configuration

The groupingFields should use just user_id_s, and optionally the "sort" parameter should be set to timestamp_tdt asc - this way the sessionization process will work most efficiently. On the other hand, sorting by timestamp requires more work on the Solr-side, so it may be omitted, with the possible side-effect that there will be additional partial documents created.