Signals Aggregator API

The Signals Aggregator API is used to aggregate signal events, which allows faster querying for recommendations. To use recommendations, signals need to be recorded and then aggregated.

When signals are enabled for a collection, two system-level collections are created. The first is named collection_signals, where collection is the sibling collection name, and signal events are indexed to this collection. The second is named collection_signals_aggr, and is the default location for aggregated signal events. See Signals API for more information on how to index signal events.

The aggregation process creates tuples for the fields selected when creating the aggregator job. A default tuple is applied if none is specified.

The aggregation process can remove the raw signals if desired, or keep them for other aggregation jobs.

Create, Update, Delete or List an Aggregator Job

The path for this request is:

/api/apollo/aggregator/aggregations/<id>

where <id> is the ID of an aggregation job.

A GET request will list the properties for an aggregator job with a provided ID or all defined aggregation jobs by ID if the path doesn’t include a specific job ID.

POST requests with parameters define jobs, and PUT allows replacing existing job properties. Note that the PUT request will replace the existing job definitions with the new properties submitted with the request. Any properties not included with the PUT request will be replaced (possibly as 'null' if omitted.)

DELETE will remove the aggregator job.

Aggregator Job Definition Properties

Parameter Description

time

A timestamp for signals to aggregate, used when starting the aggregation job.

rows

Defines the size of event batches to retrieve.

sync

If set to true, the aggregation job will run in the foreground and will restrict any other aggregation jobs from running until it is complete. The default is false, which also requires you to poll the job for status.

The response to either a POST or a GET request includes the properties of the job and the current status. Whan a job is started, the output includes the output_collection, the dates that will be used, and the collection being used for the data.

Start or Check an Aggregator Job

The path for this request is:

/api/apollo/aggregator/jobs/<collectionName>_signals/<id>

where <id> is the ID of an aggregation job and <collectionName>_signals is the signals collection that contains the events to be aggregated.

A POST request will start the specified job, while a GET request will check the job status.

Get Aggregator Job History

The path for this request is one of:

/api/apollo/history/aggregator/items

/api/apollo/history/aggregator/items/<id>

where <id> is the ID of an aggregation job.

If the ID is specified, a GET request will return the history for the defined aggregator job. If the ID is omitted, a GET request will return a list of all aggregation job IDs.

Input

None.

Output Content

The output will include information about the job including when it started and ended, how many signals were processed, and the details of the job properties.

Signals Aggregator Definitions Properties

Parameter Description

id
Optional

A unique identifier for this aggregator job.

groupingFields
Optional

The fields that define unique tuples. The fields list is defined as a JSON array, with commas between each field name.

If a set of fields is not defined, then a default tuple 'doc_id_s','query_s','filters_s' will be used.

signalTypes
Optional

The types of signals to aggregate. The type list is defined as a JSON array, with commas between each type.

The types must be existing types used for events in your signals collection.

aggregator
Optional

The name of the aggregator implementation.

If it is not defined, this will default to click , which is an implementation optimized for aggregating signals based on user clicks. Aggregated records from this implementation will include a 'weight_d' field which can be used in boosting clicked documents.

If you are not aggregating user click events, you can choose simple. This implementation does not add a 'weight_d' field to each record.

A third option is special is described in more detail in page Aggregator Scripting.

selectQuery
Optional

Any query to identify signal events.

timeRange
Optional

A valid range query to select events to aggregate.

sort
Optional

Specifies ordering of raw signal events within an aggregation.

The default ordering is by event id ("id asc"). It can be set to use other fields using the standard Solr sort expressions, e.g. "timestamp_dt asc", also multiple criteria separate by comma, e.g. "type_s asc,timestamp_dt desc".

Note: the sorting by "id asc" is always appended as the last sort criteria in order to break ties.

outputPipeline
Optional

The name of a pipeline to use for processing aggregating events.

outputCollection
Optional

The collection in which to store the aggregated events.

rollupPipeline
Optional

The pipeline to use for rollups.

rollupAggregator
Optional

The name of the aggregator implementation to use for rollups.

sourceRemove
Optional

If true, then signal events that have been aggregated will be removed from the index.

The default is false.

sourceCatchup
Optional

If true, the original time range of the aggregation will be modified to span only the period since the last successful aggregation.

The default is false.

outputRollup
Optional

If true, the default, after performing the source data aggregation an additional aggregation step will be executed to roll-up the new aggregates with old aggregates that exist in the output collection for the same aggregation type.

aggregates
Optional

A list of aggregation functions. Since it’s possible to pass side-effects from one function to a later function in the list, the functions should be declared in the desired order of execution.

The available aggregator functions are described in more detail in the section Aggregator Functions.

params
Optional

The params allows defining aggregation job parameters.

The most common use of this property is to define JavaScript scripts to customize the aggregator behavior. See the section Aggregator Scripting for more details.

Note that for large aggregation definitions, you could create a .json formatted file with the desired properties and upload it with cURL’s -d parameter.

No output is returned when creating or updating an aggregator job.

When a job is listed, the properties returned are the same as the possible properties when defining a job.

Examples

Create an aggregator job for the click type of signals, with an aggregate function to provides counts by the id field:

REQUEST

curl -u user:pass -X POST -H 'Content-Type: application/json' -d '{"id":"1", "signalTypes":["click"], "aggregates":[{"type":"count", "sourceFields":["id"], "targetField": "count_d"}]}' http://localhost:8764/api/apollo/aggregator/aggregations

RESPONSE

None.

Update the properties for aggregator job '1', including all the original properties plus the ones we want to add or change:

REQUEST

curl -u user:pass -X PUT -H 'Content-Type: application/json' -d '{"signalTypes":["click"], "timeRange":"[NOW/-1 TO NOW]", "aggregates":[{"type":"count", "sourceFields":["id"], "targetField": "count_d"}]}' http://localhost:8764/api/apollo/aggregator/aggregations/1

RESPONSE

None.

List the properties for aggregator job '1':

REQUEST

curl -u user:pass http://localhost:8764/api/apollo/aggregator/aggregations/1

RESPONSE

{
  "id" : "1",
  "groupingFields" : [ ],
  "signalTypes" : [ "click" ],
  "timeRange" : "[NOW/-1 TO NOW]",
  "sourceRemove" : false,
  "sourceCatchup" : false,
  "outputRollup" : false,
  "aggregates" : [ {
    "type" : "count",
    "sourceFields" : [ "id" ],
    "targetField" : "count_d",
    "params" : { }
  } ],
  "params" : { }
}

Start job '1' on the 'demo_signals' collection:

REQUEST

curl -u user:pass -X POST http://localhost:8764/api/apollo/aggregator/jobs/demo_signals/1

RESPONSE

The following output has been truncated to omit the aggregation job definition and only shows the other job properties that are returned on start.

{
  "signals" : {
    "types" : [ "click" ],
    "stats" : { }
  },
  "state" : "running",
  "job_id" : "4d69ec73358b41d38caf1eb3b378809e",
  "aggregation_time_date" : "2014-09-11T16:39:58.347Z",
  "aggregation" : {
    "id" : "r1",
    "groupingFields" : [ "doc_id_s", "query_s", "filters_s" ],
    "signalTypes" : [ "click" ],
    "selectQuery" : "*:*",
...
  "output_collection" : "bestbuy_signals_aggr",
  "NOW" : 1410453598347,
  "NOW_date" : "2014-09-11T16:39:58.347Z",
  "collection" : "bestbuy_signals",
  "aggregation_time" : 1410453598347,
  "compound_id" : "bestbuy_signals:r1"
}

See the list of aggregator job items:

REQUEST

curl -u user:pass http://localhost:8764/api/apollo/history/aggregator/items

RESPONSE

[ "demo_signals:1" ]

Get the history of job "demo_signals:1":

REQUEST

curl -u user:pass http://localhost:8764/api/apollo/history/aggregator/items/demo_signals:1

RESPONSE

{
  "events" : [ {
    "start" : "2014-04-16T20:45:16.582Z",
    "end" : "2014-04-16T20:45:16.781Z",
    "source" : "demo_signals:1",
    "type" : "run",
    "status" : "ok",
    "details" : {
      "signals" : {
        "click" : {
          "state" : "finished",
          "raw" : 2,
          "aggr_type_s" : "click",
          "aggr_class" : "com.lucidworks.apollo.service.aggregation.ClickSignalAggregator",
          "aggregated" : 2
        }
      },
      "state" : "finished",
      "job_id" : "467bc0db-a9c9-4b48-8080-439958818907",
      "aggregation_time_date" : "2014-04-16T20:45:16.556Z",
      "aggregation" : {
        "id" : "1",
        "fields" : [ "doc_id_s", "query_s", "filters_s" ],
        "types" : [ "click" ],
        "select" : "*:*",
        "range" : "[* TO NOW]",
        "remove" : false,
        "rolling" : false,
        "params" : { },
        "anyAggr" : false
      },
      "NOW" : 1397681116556,
      "commit" : "done",
      "NOW_date" : "2014-04-16T20:45:16.556Z",
      "collection" : "demo_signals",
      "aggregation_time" : 1397681116556,
      "compound_id" : "demo_signals:1"
    },
    "error" : null
  } ]
}