Create and Run a SQL Aggregation Job
experiment_signals
, you would select experiment_signals
, not experiment
.true
, which means that Managed Fusion only computes aggregations for new signals that have arrived since the last time the job was run, and up to and including ref_time
, which is usually the run time of the current job. Managed Fusion must “roll up” the newly aggregated rows into any existing aggregated rows in the COLLECTION_NAME_aggr
collection.
If you are using SQL to do aggregation but have not supplied a custom rollup SQL, Managed Fusion generates a basic rollup SQL script automatically by consulting the schema of the aggregated documents. If your rollup logic is complex, you can provide a custom rollup SQL script.
Managed Fusion’s basic rollup is a SUM of numeric types (long, integer, double, and float) and can support time_decay
for the weight field. If you are not using time_decay
in your weight calculations, then the weight is calculated using weight_d
. If you do include time_decay
with your weight calculations, then the weight is calculated as a combination of timestamp, halfLife
, ref_time
, and weight_d
.
The basic rollup SQL can be grouped by DOC_ID
, QUERY
, USER_ID
, and FILTERS
. The last GROUP BY field in the main SQL is used so it will ultimately group any newly aggregated rows with existing rows.
This is an example of a rollup query:
true
, Managed Fusion uses the last time the job was run and ref_time
(which you typically set to the current time). This is equivalent to the WHERE clause WHERE time > last_run_time AND time <= ref_time
.true
, Managed Fusion uses a filter with ref_time
(and no start time). This is equivalent to the WHERE clause WHERE time <= ref_time
.[* TO NOW]
- all past events[2000-11-01 TO 2020-12-01]
– specify by exact date[* TO 2020-12-01]
– from the first data point to the end of the day specified[2000 TO 2020]
- from the start of a year to the end of another yeartime_decay
function to compute and apply appropriate weights to aggregation groups during aggregation. Larger weights are assigned to more recent events. This reduces the impact of less-recent signals. Intuitively, older signals (and the user behavior they represent) should count less than newer signals.
If the default time_decay
function does not meet your needs, you can modify it. The time_decay
function is implemented as a User Defined Aggregate Function (UDAF).
Full function signature
This is the UDAF signature of the default time_decay
function:
halfLife
, ref_time
, and weight_d
:
halfLife
= 30 days
, ref_time
= NOW
, and weight_d
= 0.1
.
weight_d
specified should ensure that the “default” value matches the weight_d
parameter used by time_decay
.halfLife
= 30 days
, ref_time
= NOW
, and weight_d
= 0.1
.
Parameters
Parameters for time_decay
are:
Parameter | Description |
---|---|
count | Number of occurrences of the event. Typically, the increment is 1, though there is no reason it could not be some other number. In most cases, you pass count_i , which is the event count field used by Managed Fusion signals, as shown in the SQL aggregation examples. |
timestamp | The date-and-time for the event. This time is the beginning of the interval used to calculate the time-based decay factor. |
halfLife | Half life for the exponential decay that Managed Fusion calculates. It is some interval of time, for example, 30 days or 10 minutes . The interval prefix is optional. Managed Fusion treats 30 days as equivalent to interval 30 days . |
ref_time | Reference time used to compute the age of an event for the time-decay computation. It is usually the time when the aggregation job runs (NOW ). The reference time is not present in the data; Managed Fusion determines the reference time at runtime. Managed Fusion automatically attaches a ref_time column to every row before executing the SQL. |
weight_d | Initial weight for an event if not specified in the signal, prior to the decay calculation. This value is typically not present in the signal data. If some signals do contain weight_d values, this parameter should be set to match the “neutral” value there. You can use SQL to compute weight_d ; see Use Different Weights Based on Signal Types for an example. |
Tuesday, July 11, 2017 1:00:00 AM (1499734800)
.
For a signal with the timestamp Tuesday, July 11, 2017 12:00:00 AM (1499731200)
, the age of the signal in relation to the reference time is 1 hour.