Product Selector

Fusion 5.12
    Fusion 5.12

    Aggregation Jobs

    Define an aggregation job.

    Use this job when you want to aggregate your data in some way.

    id - stringrequired

    The ID for this Spark job. Used in the API to reference this job. Allowed characters: a-z, A-Z, dash (-) and underscore (_)

    <= 128 characters

    Match pattern: ^[A-Za-z0-9_\-]+$

    inputCollection - stringrequired

    Collection containing signals to be aggregated.

    definition - Aggregation Settings

    Defines the type of aggregation to perform, either SQL or legacy. SQL aggregations allow you to use ANSI SQL 2003, including numerous built-in functions to define your aggregation and rollup logic. The legacy aggregation option is based on pre-Fusion 4.0 features and will be removed in Fusion 4.1.

    timeRange - string

    The time range to select signals on, e.g., `[* TO NOW]`. See Solr date range for more options (https://solr.apache.org/guide/8_8/working-with-dates.html).

    >= 1 characters

    outputCollection - string

    The collection to write the aggregates to on output. This property is required if the selected output / rollup pipeline requires it (the default pipeline does). A special value of '-' disables the output.

    >= 1 characters

    sourceRemove - boolean

    If true, the processed source signals will be removed after aggregation. Default is false.

    Default: false

    sourceCatchup - boolean

    If checked, only aggregate new signals created since the last time the job was successfully run. If there is a record of such previous run then this overrides the starting time of time range set in 'timeRange' property. If unchecked, then all matching signals are aggregated and any previously aggregated docs are deleted to avoid double counting.

    Default: true

    sql - string

    Use SQL to perform the aggregation. You do not need to include a time range filter in the WHERE clause as it gets applied automatically before executing the SQL statement.

    >= 1 characters

    rollupSql - string

    Use SQL to perform a rollup of previously aggregated docs. If left blank, the aggregation framework will supply a default SQL query to rollup aggregated metrics.

    >= 1 characters

    groupingFields - array[string]

    The fields to group on

    typeFieldName - string

    Name of the signal type field; defaults to 'type'

    signalTypes - array[string]

    The signal types. If not set then any signal type is selected

    selectQuery - string

    The query to select the desired input documents.

    >= 1 characters

    Default: *:*

    sort - string

    The criteria to sort on within a group. If not set then sort order is by id, ascending.

    >= 1 characters

    outputPipeline - string

    What pipeline to use to process the output. If not set then '_system' pipeline will be used.

    >= 1 characters

    Default: _system

    rollupPipeline - string

    Pipeline to use for processing results of roll-up. This is by default the same indexing pipeline used for processing the aggregation results.

    >= 1 characters

    rollupAggregator - string

    The aggregator to use when rolling up. If not set then the same aggregator will be used for roll-up.

    >= 1 characters

    aggregator - string

    Aggregator implementation to use. This is either one of the symbolic names (simple, click, em) or a fully-qualified class name of a class extending EventAggregator. If not set then 'simple' is used.

    >= 1 characters

    aggregates - array[object]

    List of functions defining how to aggregate events with results. Not supported for SQL aggregations.

    object attributes:{type required : {
     display name: Type
     type: string
    }
    sourceFields : {
     display name: Source fields
     type: array
    }
    targetField : {
     display name: Target field
     type: string
    }
    mapper : {
     display name: Use in map phase
     type: boolean
    }
    parameters : {
     display name: Parameters
     type: array
    }
    }

    statsFields - array[string]

    List of numeric fields in results for which to compute overall statistics. Not supported for SQL aggregations.

    parameters - array[object]

    Other aggregation parameters (e.g. start / aggregate / finish scripts, cache size, etc).

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    rows - integer

    Number of rows to read from the source collection per request.

    Default: 10000

    readOptions - array[object]

    Additional configuration settings to fine-tune how input records are read for this aggregation.

    object attributes:{key required : {
     display name: Parameter Name
     type: string
    }
    value : {
     display name: Parameter Value
     type: string
    }
    }

    aggregationTime - string

    Timestamp to use for the aggregation results. Defaults to NOW.

    referenceTime - string

    Timestamp to use for computing decays and to determine the value of NOW.

    skipCheckEnabled - boolean

    If the catch-up flag is enabled and this field is checked, the job framework will execute a fast Solr query to determine if this run can be skipped.

    Default: true

    type - stringrequired

    Default: aggregation

    Allowed values: aggregation