Collaborative Filtering

Collaborative filtering lets your app take advantage of knowledge about the behavior of many individuals, in order to return the most useful documents to a user in a search or non-search context. It makes serendipitous discovery possible—a user is presented with items that other users deem relevant, for example, socks when buying shoes.

Parts of the Process

Collaborative filtering consists of these parts:

  • Use signals to collect data – Collaborative filtering relies on user-engagement data (for example, clicks or purchases) that the app passes to Fusion as signals. Different kinds of collaborative filtering require different data.

  • Aggregate signal data – Aggregate signal data so that the data can be used to compute recommendations.

  • Compute which documents to recommend – Calculating recommended items using data from many users regarding many items can be time consuming. The calculations are done periodically by running jobs.

  • Recommend the documents at query time – Document recommendations happen in a query pipeline, as boosts or as responses (document IDs). At query time, the documents to recommend are known, so the operation is fast.

Use Signals to Report Data to Fusion

Use signals to report user activity to Fusion. Signals contain data of interest. Signals generated from user-item interactions contain a type field that identifies the type of interaction, for example, a click or a purchase. They also contain these params fields:

  • query – Either a search query or a context-query. This is used for items-for-query (Boost with Signals) recommendations, and might be used for items-for-user, items-for-item, and users-for-item recommendations.

  • userId – A user ID. This can be anything that is associated with the user. A persistent identifier is needed, but the persistence can be limited. For example, a cookie can provide the user ID. The longer duration the persistence is, the better the recommendations. This is used for items-for-user, items-for-item, and users-for-item recommendations.

  • doc_Id – Document ID. This is used for all types of collaborative recommendations.

Heterogeneous signals are OK. Signals that lack query won’t contribute to items-for-query recommendations. Signals that lack userId won’t contribute to items-for-user, items-for-item, or users-for-item recommendations.

Aggregate Signal Data

An aggregation job runs to aggregate signal data. These are the jobs:

  • Boost with Signals<collection>_click_signals_aggregation. The job places aggregated data in the collection <collection>_signals_aggr.

  • Recommend Items for User<collection>_item_aggregation. The job places aggregated data in the collection <collection>_signals_aggr.

  • Recommend Items for Item<collection>_item_aggregation. The job places aggregated data in the collection <collection>_signals_aggr.

Compute the Documents to Recommend

For items-for-query recommendations (Boost with Signals), the <collection>_click_signals_aggregation job produces the information needed to recommend documents at query time.

For the other collaborative recommendations, Fusion runs the <collection>_item_recommendation job automatically upon successful completion of the <collection>_item_aggregation job. The <collection>_item_recommendation job places recommendations in these collection:

<collection>_items_for_user_recommendations <collection>_items_for_item_recommendations

  • Items-for-user recommendations (produced by default) – If you don’t need items-for-user recommendations, delete the collection name <collection>_items_for_user_recommendations from the Items-for-users Recommendation text box.

  • Items-for-item recommendations (produced by default) – If you don’t need items-for-item recommendations (or item-item similarities), delete the collection name <collection>_items_for_item_recommendations from the Item-to-item Similarity Collection text box.

  • Users-for-item recommendations (not produced by default)– To produce users-for-item recommendations, add the collection name <collection>_users_for_item_recommendations in the Users-for-items Recommendation Collection text box.

The <collection>_item_recommendations job can produce items-for-user, items-for-item, and/or users-for-item recommendations in a single (periodic) run. You don’t need to set up multiple jobs.

Recommend Documents at Query Time

An app calls Fusion at query time to obtain documents to recommend. The query pipeline must contain a recommender pipeline stage, as well as other pipeline stages as needed.

Query pipeline stages

These query pipeline stages provide collaborative filtering:

Built-in Query Pipelines

Fusion creates these query pipelines when you enable recommendations:

  • <collection>_items_for_user_recommendations – Returns items-for-user recommendations.

  • <collection>_items_for_item_recommendations – Returns items-for-item recommendations.