Aggregations
Signals are most useful when they are aggregated into a set of summaries that can be used to enrich the search experience through recommendations and boosting.
When signals are enabled for a "primary" collection, a <primarycollectionname>_signals
collection and a <primarycollectionname>_signals_aggr
collection are created automatically.
Note
|
You must configure aggregations from the collection that contains the signals data (usually <primarycollectionname>_signals ), not from the primary (<primarycollectionname> ) collection.
|
You can find the _signals
collection by navigating to Applications > Collections and expanding your original collection to display its system collections.
Aggregation Pipelines
Aggregated events are indexed, and use a default pipeline named "aggr_rollup". This pipeline contains one stage, a Solr Indexer stage to index the aggregated events.
You can create your own custom index pipeline to process aggregated events differently if you choose.
Aggregation Functions
The section Aggregator Functions documents the available set of aggregation functions.
Custom aggregation functions can be defined via a JavaScript stage. The options described in Aggregator Scripting provide more detail on the objects available for scripts.
Aggregation properties
The aggregation process is specified by an aggregation type consisting of the following list of properties:
Name | Description |
---|---|
|
Aggregation ID |
|
List of signal field names |
|
List of signal types |
|
Symbolic name of the aggregator implementation |
|
Query string, default |
|
Ordering of aggregated signals |
|
String specifying time range, e.g., |
|
Pipeline ID for processing aggregated events |
|
Output collection name |
|
Rollup pipeline ID |
|
Name of the aggregator implementation used for rollups |
|
Boolean, default is false |
|
Boolean, default is true |
|
Boolean, default is true |
|
List of aggregation functions |
|
Arbitrary parameters to be used by specific aggregator implementations |
Aggregation job configuration
The groupingFields should use just user_id_s
, and optionally the "sort" parameter should be set to timestamp_tdt asc
- this way the sessionization process will work most efficiently. On the other hand, sorting by timestamp requires more work on the Solr-side, so it may be omitted, with the possible side-effect that there will be additional partial documents created.