Recommendations

In Fusion, recommendations are search results derived from signals or aggregated signals.

Recommendation methods

There are two ways to get recommendations from Fusion: the Recommendations API or recommender stages for query pipelines.

Generally, the Recommendations API is useful for spot-checking recommendations for particular users or items, or when you do not need hybrid recommenders. Otherwise, recommender query stages are usually preferable.

Below is a more detailed comparison of the two methods:

Recommendations API

Recommender query stages

Customization

Limited customization.

Highly configurable.

"Cold start"

When no pertinent signals exist, no recommendations are returned.

When no pertinent signals exist, the query’s normal search results are returned.

Recommendation types

The Recommendations API provides the following kinds of recommendations:

  • A list of items recommended for the given query, type "itemsForQuery".

  • A list of queries associated with the given item, using aggregated clicks.

  • A list of items associated with the given item, using aggregated clicks.

These query stages provide different kinds of recommendations:

Using the Recommendations API

In order to use the REST-API Recommendations service to get recommendations for items in some collection, that collection must have associated signals and aggregated-signals system collections. The goodness of these recommendations will depend on how well the information in the signals and aggregated signals collections, which is derived from observed user behavior, matches user behavior going forward.

See the the Recommendations API for details.

Using recommender query stages

By using a stage in the query pipeline, you can boost search results to any arbitrary query using:

  • Aggregated clicks for that query. These recommendations are of the form: other users who searched for that same query clicked on X.

  • A simple collaborative filtering pipeline, outlined below. These recommendations are of the form: other users with a history similar to yours clicked on, bought, or otherwise expressed interest in X.

Hybrid recommenders

Hybrid recommenders provide personalized recommendations when available, and general fallbacks otherwise. Using recommender stages in a query pipeline makes it easy to create hybrid recommenders by simply adding additional recommender stages.

For example, if you want to create a recommender system that offers user-personalized recommendations, but defaults to "most popular" in the absence of specific user data (the first time a new user visits the site, for example), and you don’t have a field representing "popularity" indexed on your collection, then you can create a query pipeline with two recommenders: the first would be a user-personalized recommender using the Eventminer algorithm or the simpler CF algorithm, and the second would query the results of an aggregation on "click" events or "purchase" events and turn that into boosts (assuming that you are sending such events into a signals collection in Fusion). Or, if you have that popularity data available elsewhere, you can bring it into Solr using a Fusion connector so that you can query it in the query pipeline. You can set the relative weights between the user-personalized and popularity-based boosts so that user-personalized recommendations always outweigh popularity if present, or you could blend them if desired.

The downside of this approach is that you are making multiple calls to Solr (3 or 4 calls per search, in this example). But there are several mitigating factors:

  • Simple Solr queries are surprisingly fast.

  • The query to get "most popular" products is likely to be a cache hit for most systems.

  • Fusion query pipelines are stateless, and Solr scales linearly in SolrCloud mode to thousands of nodes, so that adding more nodes is a simple matter if necessary.

Click-based recommendations

In order to generate click-based recommendations, you’ll need to have click events in the system. Typically, these are sent in via the Signals service (or a signal-specific query pipeline). These events should include both the query that the user entered, and the item that they clicked on. The sample search UI bundled with Fusion sends in click events in this way. You’ll also need an aggregation defined that aggregates these signals by query, and a schedule so that those aggregations run regularly.

Three-stage click-based recommender system:

  • sub-query stage

  • rollup stage

  • advanced-boosting stage

The results are handed off to a Solr query which uses the boosts generated by the boost stage. (Thus the pipeline consists of four stages, where the first three stages produce recommendations).

User-personalized recommendations (simple collaborative filtering)

Doing user-personalized recommendations is quite similar to the three-stage click-based recommender system. It adds two additional stages to get a user’s history and then aggregate that history.

This method is "simple" because of the algorithm itself, not because of its ease of implementation — it does part of the user-to-user matching at runtime. The setup will be simplified in future versions of Fusion, but for now, the basic collaborative filter can be done with the following sequence of stages:

  • sub-query stage

  • rollup stage

  • sub-query stage

  • rollup stage

  • advanced-boosting stage

  • solr query

Stage-by-stage synopsis

1 Sub-query stage

First, we need to query for the current user’s history, so that we can find other products that will match. So we’ll use a Solr SubQuery stage that pulls the user’s ID out of the request, then passes that as a parameter to Solr.

This example looks for user click events in collection "products_signals", although you could also query for aggregated records instead, or a combination of raw and aggregated records if they are in the same collection. It assumes that "user_id" is passed in on the query as a parameter; if you want to pass in the user ID as a different parameter, simply change the parameter name in parentParams and params arguments. It also assumes that the products or documents that the user has viewed are stored in a Solr field named "doc_id_s", and the user ID is stored in a field named "user_id_s". Those can be changed to whatever is appropriate in your schema.

{
    "type": "sub-query",
    "key": "subquery-results",
    "collection": "products_signals",
    "handler": "select",
    "method": "GET",
    "parentParams": [
        "user_id"
    ],
    "params": [
        {
            "key": "q",
            "value": "_query_:\"{!dismax qf='user_id_s' v=$user_id}\""
        },
        {
            "key": "fl",
            "value": "doc_id_s"
        }
    ],
    "skip": false,
    "label": "sub-query"
}
2 Rollup stage

The requests from that subquery are now attached to the context (a temporary data structure that is passed between query pipeline stages) in the "subquery-results" key. So now we need to take those results and turn them into something we can use. We’ll use a Rollup Aggregation for that purpose. We just want a list of document IDs (which, again, we’re assuming is stored in a field named "doc_id_s", but you can adjust to suit).

{
    "type": "rollup-rec-aggr",
    "key": "subquery-results",
    "resultKey": "rollup",
    "rollupField": "doc_id_s",
    "weightFunction": "sum",
    "maxRows": 100,
    "sort": true,
    "skip": false,
    "label": "rollup-rec-aggr"
}
3 Sub-query stage

Armed with the items that the user has previously interacted with, we’ll now query the "products_signals_aggr" collection to find other users with similar tastes. There is a lot of room for tuning here.

{
    "type": "sub-query",
    "key": "second-results",
    "collection": "products_signals_aggr",
    "handler": "select",
    "method": "GET",
    "rollupKeys": [
        "rollup"
    ],
    "params": [
        {
            "key": "q",
            "value": "_query_:\"{!dismax qf=doc_id_ss v=$rollup}\""
        },
        {
            "key": "fq",
            "value": "aggr_type_s:simple@user_id_s"
        },
        {
            "key": "bq",
            "value": "recip(doc_id_count_i,1,1000,1000)"
        },
        {
            "key": "rows",
            "value": "100"
        },
        {
            "key": "fl",
            "value": "doc_id_ss score"
        },
        {
            "key": "fq",
            "value": "doc_id_count_i:[2 TO *]"
        }
    ],
    "skip": false,
    "label": "sub-query"
}
4 Rollup stage

We’ll exclude results that are in the first rollup (so that we don’t recommend things that the user has already seen).

{
    "type": "rollup-rec-aggr",
    "key": "second-results",
    "resultKey": "second-rollup",
    "rollupField": "doc_id_ss",
    "excludeResultsKey": "rollup",
    "weightField": "score",
    "weightFunction": "sum",
    "maxRows": 20,
    "sort": true,
    "skip": false,
    "label": "rollup-rec-aggr"
}
5 Advanced-boosting stage

Note that we’re rescaling boosts to be in a range between 1 and 10. If we were creating a hybrid recommender, we could add more stages and scale the results in order to combine the recommendations however we chose.

{
    "type": "adv-boost",
    "boostingMethod": "query-param",
    "boostingParam": "boost",
    "key": "second-rollup",
    "boostFieldName": "doc_id_s",
    "scaleRange": {
        "scaleMin": 1,
        "scaleMax": 10
    },
    "skip": false,
    "label": "adv-boost"
6 Solr query

Finally, you’ll want a Solr query stage to make the actual request to Solr, using a standard Solr query which will automatically use the boosts generated by the boost stage:

{
    "type": "solr-query",
    "skip": false,
    "label": "solr-query"
}

Complete Pipeline Definition

Here is the complete pipeline definition over a collection named "products":

{
    "id": "user-personalized-rec-pipeline",
    "stages": [
{
    "type": "sub-query",
    "key": "subquery-results",
    "collection": "products_signals",
    "handler": "select",
    "method": "GET",
    "parentParams": [
        "user_id"
    ],
    "params": [
        {
            "key": "q",
            "value": "_query_:\"{!dismax qf='user_id_s' v=$user_id}\""
        },
        {
            "key": "fl",
            "value": "doc_id_s"
        }
    ],
    "skip": false,
    "label": "sub-query"
},
{
    "type": "rollup-rec-aggr",
    "key": "subquery-results",
    "resultKey": "rollup",
    "rollupField": "doc_id_s",
    "weightFunction": "sum",
    "maxRows": 100,
    "sort": true,
    "skip": false,
    "label": "rollup-rec-aggr"
},
{
    "type": "sub-query",
    "key": "second-results",
    "collection": "products_signals_aggr",
    "handler": "select",
    "method": "GET",
    "rollupKeys": [
        "rollup"
    ],
    "params": [
        {
            "key": "q",
            "value": "_query_:\"{!dismax qf=doc_id_ss v=$rollup}\""
        },
        {
            "key": "fq",
            "value": "aggr_type_s:simple@user_id_s"
        },
        {
            "key": "bq",
            "value": "recip(doc_id_count_i,1,1000,1000)"
        },
        {
            "key": "rows",
            "value": "100"
        },
        {
            "key": "fl",
            "value": "doc_id_ss score"
        },
        {
            "key": "fq",
            "value": "doc_id_count_i:[2 TO *]"
        }
    ],
    "skip": false,
    "label": "sub-query"
},
{
    "type": "query-logging",
    "detailed": true,
    "skip": false,
    "label": "query-logging"
},
{
    "type": "rollup-rec-aggr",
    "key": "second-results",
    "resultKey": "second-rollup",
    "rollupField": "doc_id_ss",
    "excludeResultsKey": "rollup",
    "weightField": "score",
    "weightFunction": "sum",
    "maxRows": 20,
    "sort": true,
    "skip": false,
    "label": "rollup-rec-aggr"
},
{
    "type": "query-logging",
    "detailed": true,
    "skip": false,
    "label": "query-logging"
},
{
    "type": "adv-boost",
    "boostingMethod": "query-param",
    "boostingParam": "boost",
    "key": "second-rollup",
    "boostFieldName": "doc_id_s",
    "scaleRange": {
        "scaleMin": 1,
        "scaleMax": 10
    },
    "skip": false,
    "label": "adv-boost"
},
{
    "type": "solr-query",
    "skip": false,
    "label": "solr-query"
}
    ]
}