Creating Aggregation Jobs

Aggregations are created automatically whenever you enable signals or recommendations. This topic explains how to create or modify aggregations individually. You can do this using the Fusion UI or the Jobs API.

Note
As of Fusion 3.1, the Aggregator API is deprecated in favor of the Jobs API.

Creating an aggregation job using the Fusion UI

An aggregation is a type of job. Aggregation jobs can be created or modified at Search > Jobs in the Fusion UI.

  1. Navigate to Search > Jobs.

  2. Click Add.

    New aggregation

  3. Select Aggregation.

    The New Job Configuration panel appears.

    New aggregation

  4. Enter an arbitrary Spark job ID.

  5. Enter the name of the signals collection to be aggregated.

    Note
    Be sure to specify the signals collection (usually <primarycollectionname>_signals), not the primary (<primarycollectionname>) collection.
  6. Under Aggregation Settings, click include.

  7. Configure the aggregation parameters as needed.

    See Aggregation configuration parameters below for descriptions.

  8. Click Save.

    The new aggregation job appears in the jobs list. Now you can run it or schedule it.

Aggregation configuration parameters

groupingFields

An array of strings specifying the fields to group on.

signalTypes

The signal types. If not set then any signal type is selected.

selectQuery

The query to select the desired signals. If not set then *:* will be used, or equivalent.

sort

The criteria to sort on within a group. If not set then sort order is by ID, ascending.

timeRange

The time range to select signals on.

outputPipeline

What pipeline to use to process the output. If not set then _system pipeline will be used.

rollupPipeline

Pipeline to use for processing results of roll-up. This is by default the same indexing pipeline used for processing the aggregation results.

rollupAggregator

The aggregator to use when rolling up. If not set then the same aggregator will be used for roll-up.

outputCollection

The collection to write the aggregates to on output. This property is required if the selected output/rollup pipeline requires it (the default pipeline does). A special value of - disables the output.

aggregator

Aggregator implementation to use. This is either one of the symbolic names (simple, click, em) or a fully-qualified class name of a class extending EventAggregator. If not set then 'simple' is used.

sourceRemove

If true, the processed source signals will be removed after aggregation. Default is false.

sourceCatchup

If true, only aggregate the signals since the last time the job was successfully run. If there is a record of such previous run then this overrides the starting time of time range set in timeRange property.

outputRollup

Roll-up current results with all previous results for this aggregation id, which are available in outputCollection.

aggregates

List of functions defining how to aggregate events with results. Aggregation functions have these properties:

  • type

    The function type defining how to aggregate events with results.

  • sourceFields

    The fields that the function will read from.

  • targetField

    The field that the function will write to.

  • mapper

    When true the function will be used in map phase only.

  • parameters

    Other parameters specific to individual functions.

statsFields

List of numeric fields in results for which to compute overall statistics.

parameters

Other aggregation parameters (such as start / aggregate / finish scripts, cache size, and so on).