true
, which means that Fusion only computes aggregations for new signals that have arrived since the last time the job was run, and up to and including ref_time
, which is usually the run time of the current job. Fusion must “roll up” the newly aggregated rows into any existing aggregated rows in the _aggr
collection.
If you are using SQL to do aggregation but have not supplied a custom rollup SQL, Fusion generates a basic rollup SQL script automatically by consulting the schema of the aggregated documents. If your rollup logic is complex, you can provide a custom rollup SQL script.
Fusion’s basic rollup is a SUM of numeric types (long, integer, double, and float) and can support time_decay
for the weight field. If you are not using time_decay
in your weight calculations, then the weight is calculated using weight_d
. If you do include time_decay
with your weight calculations, then the weight is calculated as a combination of timestamp, halfLife
, ref_time
, and weight_d
.
The basic rollup SQL can be grouped by DOC_ID
, QUERY
, USER_ID
, and FILTERS
. The last GROUP BY field in the main SQL is used so it will ultimately group any newly aggregated rows with existing rows.
This is an example of a rollup query:
true
, Fusion uses the last time the job was run and ref_time
(which you typically set to the current time). This is equivalent to the WHERE clause WHERE time > last_run_time AND time <= ref_time
.true
, Fusion uses a filter with ref_time
(and no start time). This is equivalent to the WHERE clause WHERE time <= ref_time
.[* TO NOW]
- all past events[2000-11-01 TO 2020-12-01]
– specify by exact date[* TO 2020-12-01]
– from the first data point to the end of the day specified[2000 TO 2020]
- from the start of a year to the end of another yeartime_decay
function to compute and apply appropriate weights to aggregation groups during aggregation. Larger weights are assigned to more recent events. This reduces the impact of less-recent signals. Intuitively, older signals (and the user behavior they represent) should count less than newer signals.
If the default time_decay
function does not meet your needs, you can modify it. The time_decay
function is implemented as a UserDefinedAggregateFunction (UDAF).
Full function signature
This is the UDAF signature of the default time_decay
function:
decay_sum
aggregator function used the difference between the aggregationTime
(the time at which the aggregation job is run) and the event time to calculate exponentially decayed numerical values.time_decay
is a similar function. In time_decay
, ref_time
is used instead of aggregationTime
. You can set aggregationTime
to some other time than the run time of the aggregation job.aggregationTime
as ref_time
.
Abbreviated function signature and default values
Your function call can also use this abbreviated UDAF signature, that omits halfLife
, ref_time
, and weight_d
:
halfLife
= 30 days
, ref_time
= NOW
, and weight_d
= 0.1
.
Matching legacy aggregation
To match the results of legacy aggregation, either use the abbreviated function signature or supply these values for the mentioned parameters: halfLife
= 30 days
, ref_time
= NOW
, and weight_d
= 0.1
.
Parameters
Parameters for time_decay
are:
Parameter | Description |
---|---|
count | Number of occurrences of the event. Typically, the increment is 1, though there is no reason it could not be some other number. In most cases, you simply pass count_i , which is the event count field used by Fusion signals, as shown in the SQL aggregation examples. |
timestamp | The date-and-time for the event. This time is the beginning of the interval used to calculate the time-based decay factor. |
halfLife | Half life for the exponential decay that Fusion calculates. It is some interval of time, for example, 30 days or 10 minutes . The interval prefix is optional. Fusion treats 30 days as equivalent to interval 30 days . |
ref_time | Reference time used to compute the age of an event for the time-decay computation. It is usually the time when the aggregation job runs (NOW ). The reference time is not present in the data; Fusion determines the reference time at runtime. Fusion automatically attaches a ref_time column to every row before executing the SQL. |
weight_d | Initial weight for an event, prior to the decay calculation. This value is typically not present in the signal data. You can use SQL to compute weight_d ; see Use Different Weights Based on Signal Types for an example. |
Tuesday, July 11, 2017 1:00:00 AM (1499734800)
.
For a signal with the timestamp Tuesday, July 11, 2017 12:00:00 AM (1499731200)
, the age of the signal in relation to the reference time is 1 hour.