Signals

A signal is a recorded event related to one or more documents in a collection. Signals can record any kind of event that is interesting to your organization. Queries and clicks are the most common types of signals, as they are useful for tracking what users search for and what actions they take.

Signals are indexed in a secondary collection which is linked to the primary collection by the naming convention <primarycollectionname>_signals. So, if your main collection is named products, the associated signals collection is named products_signals. The signals collection is created automatically when signals are enabled for the primary collection.

Signals are indexed just like ordinary documents. The signals collection can be searched like any other collection, for example to retrieve a user’s search history or last viewed items.

Signals are most useful when they are aggregated into a set of summaries that can be used to enrich the search experience through recommendations and boosting. Like the signals collection, a <primarycollectionname>_signals_aggr collection is created automatically when signals are enabled for a primary collection. An aggregation job is also created automatically, and scheduled to run every two minutes.

Enabling and disabling signals

Using the Fusion UI, when you create a collection, signals are enabled and a signals collection created by default.

Using the API, the /collections/{collection}/features/{feature} endpoint enables or disables signals for any collection:

Check whether signals are enabled for a collection
curl -u user:pass http://localhost:8765/api/v1/collections/<collection-name>/features/signals
Enable signals for a collection
curl -u user:pass -X PUT -H "Content-type: application/json" -d '{"enabled" : true}' http://localhost:8765/api/v1/collections/<collection-name>/features/signals
Disable signals for a collection
curl -u user:pass -X PUT -H "Content-type: application/json" -d '{"enabled" : false}' http://localhost:8765/api/v1/collections/<collection-name>/features/signals

Signal document structure

A raw signal is stored as a Solr document with the following fields, which are derived from the raw signal as follows:

Field Description

id
Optional

The signal ID. If no ID is supplied, one will be automatically generated.

type
Required

The signal type that is being sent. This value is used during aggregation to filter events of the same type. Types can be mixed in aggregation jobs, if needed.

The type can consist of any string you choose. For consistency, always send events of the same type with the same type value.

During indexing, type values will be moved to a field named type_s.

params
Optional

The params allow flexible field definition of the fields you care about and will use later for signal aggregation:

  • docId - A unique document ID

    This is stored in the Solr raw signal document as field doc_id_s.

  • query - A query string; for example, a user’s search

    This is copied to the Solr raw signal document as both fields query_s and query_t. Some cleanup occurs to convert the string to lowercase, decode URL encoding, and replace white space with single space characters. The original query is saved in field query_orig_s.

  • filterQueries - A list of strings, such as filters on the search query

    This is copied to the Solr raw signal document as both filters_s and filters_orig_ss.

  • collection - The primary collection name

  • weight - A float value representing the relative weight of this signal

    This is saved in the field weight_d.

  • count - A positive integer value representing the incremented count of signals

    This is saved in the field count_i.

timestamp

The timestamp of the signal event.

  • When using the Signals API, this property is optional; it defaults to the current server time.

  • When using the Signal Formatter index stage, one of the following fields must be present: timestamp, timestamp_tdt, timestamp_dt, or epoch.

Here is the JSON representation of one click signal, taken from an example dataset of synthetic clickstream data:

{ "params": {
      "docId": "2125233",
      "filterQueries": ["cat00000","abcat0100000", "abcat0101000", "abcat0101001"],
      "query": "Televisiones Panasonic  50 pulgadas" }
 "type":"click",
 "timestamp": "2011-09-01T23:44:52.533000Z",
}

The default signals indexing pipeline

When indexing signals, a default indexing pipeline named _signals_ingest will be used unless you specify a different index pipeline.

The _signals_ingest pipeline has three stages:

If you prefer different options in the signals indexing pipeline, you can pass a query parameter when indexing signals that contains the name of your custom index pipeline.

If you create a custom pipeline, it must include a Field Mapping stage and a Solr Indexer stage (see Index Pipeline Stages for more details), which sends the documents to Solr. Additionally, the Solr Indexer stage must have the enforce_schema property set to "true".

Removing signals

The aggregator includes an option to delete signals after they have been processed. If, however, you have chosen not to remove signals during aggregation, you can also run a "delete" query in Solr to delete documents from the signals collection.

Video tutorial

This video tutorial explains how to boost searches using click signals and aggregations: