Signals

A signal is a recorded event related to one or more documents in a collection. Signals can record any kind of event that is useful to your organization. Queries and clicks are the most common types of signals, as they are useful for tracking what users search for and what actions they take.

Signals are indexed in a secondary collection which is linked to the primary collection by the naming convention <primarycollectionname>_signals. So, if your main collection is named products, the associated signals collection is named products_signals. The signals collection is created automatically when signals are enabled for the primary collection. Signals are enabled by default whenever a new collection is created.

Signals are indexed just like ordinary documents. The signals collection can be searched like any other collection, for example by using the Query Workbench with the signals collection selected.

App Insights provides visualizations and reports with which to analyze your signals. App Insights mainly uses raw signals, but also uses some aggregated signals.

Note
The signals schema changed in Fusion 4.0. See the descriptions of signals types and structure below.

Enabling and disabling signals

Enabling signals automatically creates the necessary _signals and _signals_aggr collections, plus several Parameterized SQL Aggregation jobs (if you have a Fusion AI license) for signal processing and aggregation:

  • Click Signals Aggregation job

  • Session Rollup job

  • User Items Preferences Aggregation job

  • User Query History Aggregation job

When signals are enabled, you can view these jobs at Collections > Jobs. When you disable signals, these jobs are deleted, but the _signals and _signals_aggr collections are not; your legacy signal data remains intact.

Using the UI

When you create a collection using the Fusion UI, signals are enabled and a signals collection created by default. You can also enable and disable signals for existing collections using the Collections Manager.

Enable signals for a collection
  1. In the Fusion workspace, navigate to Collections > Collections Manager.

  2. Hover over the primary collection for which you want to enable signals.

  3. Click Configure icon Configure to open the drop-down menu.

    Enable Signals

  4. Click Enable Signals.

    The Enable Signals window appears, with a list of collections and jobs that are created when you enable signals.

    Enable Signals

  5. Click Enable Signals.

Disable signals for a collection
  1. In the Fusion workspace, navigate to Collections > Collections Manager.

  2. Hover over the primary collection for which you want to disable signals.

  3. Click Configure icon Configure to open the drop-down menu.

  4. Click Disable Signals.

    The Disable Signals window appears, with a list of jobs that are created when you enable signals.

  5. Click Disable Signals.

    Your _signals and _signals_aggr collections remain intact so that you can access your legacy signals data.

Using the Collection Features API

Using the API, the /collections/{collection}/features/{feature} endpoint enables or disables signals for any collection:

Check whether signals are enabled for a collection
curl -u user:pass http://localhost:8764/api/collections/<collection-name>/features/signals
Enable signals for a collection
curl -u user:pass -X PUT -H "Content-type: application/json" -d '{"enabled" : true}' http://localhost:8764/api/collections/<collection-name>/features/signals
Disable signals for a collection
curl -u user:pass -X PUT -H "Content-type: application/json" -d '{"enabled" : false}' http://localhost:8764/api/collections/<collection-name>/features/signals

Signals data flow

This diagram shows the flow of signals data from the search app through Fusion AI. The numbered steps are explained below.

Signals data flow

  1. The search app sends a query to a Fusion query pipeline.

    The query request should include a user ID and session query parameter to identify the user.

  2. Optionally, the Fusion query pipeline queries the _signals_aggr collection to get boosts for the main query based on aggregated click data.

  3. The search app also sends a request signal to the Fusion /signals endpoint.

    The primary intent of a request signal is to capture the raw user query and contextual information about the user’s current activity in the app, such as the user agent and the page where they generated the query.

  4. Once Solr returns the response to Fusion, the SearchLogger component indexes the complete request/response data into the _signals collection as a response signal using the _signals_ingest pipeline.

    Note
    This is a departure from pre-4.0 versions of Fusion where query impressions were logged in a separate _logs collection. Query activity is no longer indexed into the _logs collection. All response signals use the fusion_query_id (see below) as the unique document ID in Solr.
  5. When the user clicks a link in the search results, the search app sends a click event to the Fusion signals endpoint (which invokes the _signals_ingest pipeline behind the scenes).

    The click signal must include a field named fusion_query_id in the params object of the raw click signal. The fusion_query_id field is returned in the query response (from step 1) in a response header named x-fusion-query-id. This allows Fusion to associate a click signal with the response signal generated in step 4. The fusion_query_id is also used by Fusion to associate click signals with experiments.

  6. The _signals_ingest pipeline enriches signals before indexing into the _signals collection.

    This enrichment includes field mapping, geolocation resolution, and updating the has_clicks flag to "true" on request signals when the first click signal is encountered for a given request using the Update Related Document index stage.

  7. Fusion’s App Insights queries the _signals collection through a Fusion query pipeline to generate query analytics reports from raw signals.

    Note that App Insights app uses Fusion security for authentication.

  8. Behind the scenes, the SQL aggregation framework aggregates click signals to compute a weight for each query + doc_id + filters group.

    The resulting metrics are saved to the _signals_aggr collection to generate boosts on queries to the main collection (step 2 above).

  9. Recommendations also use aggregated documents in the _signals_aggr collection to build a collaborative filtering-based recommender model.

Signals types and structure

There are three main types of signals:

The signal type parameter can also take arbitrary values for custom signal types. For example, you can create special signals for purchase events, cart addition/subtraction events, "favorite" or "like" events, customer service events, and so on. Custom signals can be analyzed in App Insights just like pre-defined signal types.

Request signals

A request signal is generated by a front-end search app and captures the raw user query and other contextual information about a user and their journey through the search app. A request signal should have the following fields:

[
  {
    "id":"288fe4f7-6680-403e-8d18-27647cdd9989",
    "timestamp":1518717749409,
    "type":"request",
    "params":{
      "user_id":"admin",
      "session":"ef4e00cd-91bb-45b4-be80-e81f9f9c5b27",
      "query":"USER QUERY HERE",
      "app_id":"SEARCH APP ID",
      "ip_address":"0:0:0:0:0:0:0:1",
      "host":"Lucids-MacBook-Pro-5.local",
      "filter":[
        "field1/value",
        ...
      ],
      "filter_field":[
        "field1"
      ]
    }
  }
]

Additional optional fields are used by App Insights. In the raw signal, optional fields should be inside the params object. Optional fields are as follows:

"page_title":"Fusion Search",
"path":"/search",
"browser_type":"Browser",
"browser_version":"64.0.3282.140",
"browser_name":"Chrome",
"referrer":"http://localhost:8080/",
"ctx_prev_uri":"/",
"ctx_prev_query":"",
"ctx_prev_path":"/",
"os_manufacturer":"Apple Inc.",
"os_name":"Mac OS X",
"os_id":"778",
"os_device":"Computer",
"os_group":"Mac OS X"

Response signals

Response signals are automatically generated by a query pipeline when the signals feature is enabled for a collection.

Note
Front-end search applications should not send response signals to Fusion directly, as those would conflict with the auto-generated signals.

A response signal has the following explicit fields, plus any additional query parameters sent by the search application for a query:

Field Name Description Example

id

The x-fusion-query-id generated by the query-pipeline used for associating click signals with queries in experiments and aggregation jobs.

TwWCn3Dz

type

Signal type

response

response_type

Used by Insights to determine if this query had results or was empty

results | empty

session

User session ID; the search app should pass the session ID in the query params for a query

UUID

query

The actual query string sent to Solr from Fusion

ipad

query_orig_s

The incoming query from the search app before it is enriched by the query pipeline

ipad

query_id

A hash generated from the session, query, and filters fields; used as a rollup key in Insights to group activity by a specific

SHA1 hash

filters_s

Filter queries sent to Solr; the Fusion SearchLogger component combines multiple fq parameters into a single value delimited by " $ "

{!tag=format}format:(vhs) $ {!tag=type}type:(movie)

filter

Reformatted filter queries for use by App Insights

field1/value

user_id

User ID; the search app should pass the user_id in the query params

admin

doc_ids_s

A comma-delimited list of document IDs returned for the page of results; this field is used by Fusion Spark jobs, such as the ground truth job, to perform click/skip analysis

123,456,789

pipeline_id

Fusion query pipeline that processed this query

_system

collection

Fusion collection

my_collection

qtime

Query time from Solr, in milliseconds

10

rows

Number of rows requested for this query

10

hits

Total number of documents matching the query

10000

totaltime

Total processing time of this query in milliseconds, includes Solr qtime and Fusion query processing time

15

timestamp_tdt

Timestamp when the query request was received by Fusion

2018-02-15T18:17:42.560Z

res_offset

Offset of results; this field is used by experiment metrics to calculate MRR

0

params.*

Any other query param sent from the search app to Fusion that was not already mapped to a declared field

params.defType_s=edismax

Fusion’s experiment framework relies heavily on response signals and the linking between response and clicks signals using the fusion_query_id.

Click signals

Click signals are sent from the search app to Fusion. All click signals should include a fusion_query_id field pulled from the query response header x-fusion-query-id. In addition, click signals should include the following fields:

[
  {
    "id":"SOME UUID HERE",
    "timestamp":1518725351750,
    "type":"click",
    "params":{
      "fusion_query_id":"ABkaEA11",
      "user_id":"admin",
      "session":"b3a15101-9e30-4e28-8a23-d1f663c2ee06",
      "query":"tiger woods",
      "ctype":"result",
      "res_offset":0,
      "filter":[
        "type/Game"
      ],
      "ip_address":"0:0:0:0:0:0:0:1",
      "host":"Lucids-MacBook-Pro-5.local",
      "doc_id":"9502308",
      "app_id":"SEARCH APP ID",
      "res_pos":1,
      "filter_field":[
        "type"
      ]
    }
  }
]

Additional optional fields are used by App Insights. In the raw signal, optional fields should be inside the params object. Optional fields are as follows:

"browser_type":"Browser",
"browser_version":"64.0.3282.140",
"browser_name":"Chrome",
"referrer":"http://localhost:8080/",
"ctx_prev_uri":"/",
"ctx_prev_query":"",
"ctx_prev_path":"/",
"os_manufacturer":"Apple Inc.",
"os_name":"Mac OS X",
"os_id":"778",
"os_device":"Computer",
"os_group":"Mac OS X"
"url":"http://localhost:8080/#/product/9502308",
"label":"Tiger Woods PGA Tour 09 All-Play - Nintendo Wii",

The query_id field

For each incoming signal, Fusion calculates a value for the query_id field, which App Insights uses to create group-by-query reports like the one shown below:

Facet filters applied report

Note
The query_id field should not be confused with the fusion_query_id, which is a unique ID for each query processed by a Fusion query pipeline.

To calculate the value, Fusion creates a hash based on session, query, and filter fields, then saves it into the query_id field.

The filter field can either be passed in by the search app, or computed by the SignalFormatterStage (the first stage in the _signals_ingest pipeline) using the raw filter queries. For instance, on a response signal that is generated by a query pipeline, the following fq query params get translated into the multi-valued filter field:

  • Raw query parameters:

    fq={!tag=format}format:(VHS)&fq={!tag=type}type:(Movie)
  • filters_s field (created by the SearchLogger component):

    {!tag=format}format:(vhs) $ {!tag=type}type:(movie)
  • filter field:

    "filter":["format/VHS", "type/Movie"]

App Insights uses the filter field to generate various reports.

The default signals index pipeline

When indexing signals, Fusion uses a default index pipeline named _signals_ingest unless you specify a different index pipeline.

The _signals_ingest index pipeline has five stages:

  1. Format Signals stage

  2. Field Mapping stage

  3. GeoIP Lookup stage

  4. Solr Indexer stage

  5. Update has_clicks flag stage

    The Update has_clicks flag stage is an instance of the Update Related Document stage that updates the has_clicks flag to "true" on an existing request signal after the first click signal is processed for the request.

    Update Related Documents stage configuration

    The update stage works as follows:

    1. When a click signal is encountered (type == click)

    2. Look at the incoming click signal for a field named request_id_s, which gets set by the Format Signals stage using a distributed cache of recently processed request signals.

      If the request_id_s field is set, then send a real-time GET query to Solr to find a request signal with ID equal to the value of the request_id_s field on the click signal. To avoid re-updating request signals, the RTG query also filters on has_clicks==false, which avoids duplicate atomic updates on the same document in Solr. Real-time GET is used to avoid timing issues between a request signal being sent to Solr and when it gets committed. This prevents missing updates when clicks occur soon after the initial request signal is sent by the search app.

    3. If the click signal does not have the request_id_s field set, then do a normal Solr lookup for the request signal using: +query_id:"${query_id}" +type:request +has_clicks:false. A click signal may not have a request_id_s if there is a cache miss in the distributed cache used by the Format Signals stage.

    4. If the stage performs a normal query, there may be multiple request signals that have the same query_id. This is because the query_id is based on session + query + filter, so if a user sends the same query + filter during the same session, there will be multiple request signals with the same query_id value. Thus, the stage sorts to get the latest request signal to update.

    5. If a related document is found (in this case a request signal), then the stage updates the has_clicks field to true and performs an atomic update in Solr.

    This stage performs its work in a background thread, so it does not impact the indexing performance of the click signal.