Step by Step

Fusion can leverage user behavior to generate recommendations and boosts based on the behavior of similar users. This is called collaborative filtering. The default objects and schedules that are created when you enable Spark-based collaborative recommendations can produce these types of recommendations:

  • Items for User – Recommend items based on which items similar users have interacted with.

  • Items for Item – Recommend items based on the similarity of the other items to the specified item. Similarity is based on user-item interactions.

  • Users for Item – Recommend users for an item, for example to target advertising.

Here we give a step-by-step walkthrough of how to implement Items for User and Items for Item recommendations.

Step 1: Create a Collection to Hold the Data

Create a collection to contain the input data (signals about user-item interactions) and the data that Fusion generates to provide collaborative recommendations.

  1. Click Devops > Home Home > Collections > New.

  2. Name the collection and click Save Collection.

Step 2: Enable Recommendations

Create the objects and schedules that Fusion needs to process the data for items-for-user, items-for-item, and users-for-item collaborative recommendations.

Enable Recommendations in the Fusion UI enables Spark-based collaborative recommendations, that is, items-for-user and items-for-item recommendations (and users-for-item recommendations if you enable those). You don’t need to give this command to set up the recommendations infrastructure for items-for-query collaborative recommendations (Boost with Signals) or for content-based filtering.

To enable collaborative recommendations:

  1. Log in to Fusion as the admin user.

  2. From the Collections drop-down list, choose the collection for which you want to enable collaborative recommendations.

  3. Click Search.

  4. Click Settings settings > Enable Recommendations.

Enabling recommendations creates these objects:

  • Collections:

    Object Description

    <collection>_signals

    Collection to hold user interactions. If the signals feature is enabled for this collection (which it is by default), then this collection will already exist.

    <collection>_signals_aggr

    Collection to hold aggregated user interactions. If the signals feature is enabled for this collection (which it is by default), then this will already exist.

    The default aggregation groups together all signals of type click by the joint key (user_id_s, doc_id_s) from the <collection>_signals collection, and performs a time-decayed count of these results to generate an estimate of the "implicit preference" that user_id_s has for doc_id_s.

    <collection>_items_for_item_recommendations

    Collection to hold generated item-item similarities (by default 10 per item). No user_id_s data is present. A Recommend Items for Item query-pipeline stage can use the similarities to return item recommendations. For example, a query in which doc_id_s = docA would return an ordered list of other doc_id_s values for documents that are similar to document docA, along with the similarities. For example: [("docB", 0.83), ("docC", 0.55), ("docD", 0.43), …​, ("docK", 0.22)].

    <collection>_items_for_user_recommendations

    Collection to hold recommended items for a user. By default the job creates 10 recommendations per user.

  • Jobs:

    Object Description

    <collection>_item_aggregation

    Aggregates user-item interactions to generate weights. This aggregates user-item interactions and calculates a weight based on the recency of the interaction (so that more recent interactions have more impact on recommendations).

    <collection>_item_recommendations

    Runs a Spark job to train an ALS-based (Alternating Least Squares) recommendation model, and then uses that model to generate recommendations. By default, this job generates both items-for-user recommendations and items-for-item recommendations. The job stores the results in the <collection>_items_for_user_recommendations and <collection>_items_for_item_recommendations collections.

  • Job schedules:

    Object Description

    Schedule for the job <collection>_item_aggregation

    Runs the aggregation job once a day.

    Schedule for the job <collection>_item_recommendations

    Runs the recommender job after successful completion of the aggregation job.

  • Query pipelines:

    Object Description

    <collection>_items_for_user_recommendations

    Query pipeline to generate recommendations of items for a user

    <collection>_items_for_item_recommendations

    Query pipeline to generate recommendations of items similar to an item

Step 3: Collect User-Item Interaction Data

Both items-for-user and items-for-item recommendations are based on user-item interactions, that is, on some interactions that users have with items. For example, users might look at the items, click on the items, or buy the items. User-item interactions often reflect user-item preferences.

You app must collect user-item interaction data and use signals to get the data to Fusion. Data collection is an ongoing process. The more data, the better; and current data provides the best recommendations.

If you have collected user-item interaction data previously, you can load that data into Fusion. This is one way to address the cold-start problem—without user-item interaction data, the algorithms lack the data they need to make recommendations.

Use signals to report user activity to Fusion. Signals contain data of interest. Signals generated from user-item interactions contain a type field that identifies the type of interaction, for example, a click or a purchase. They also contain these params fields:

  • query – Either a search query or a context-query. This is used for items-for-query (Boost with Signals) recommendations, and might be used for items-for-user, items-for-item, and users-for-item recommendations.

  • userId – A user ID. This can be anything that is associated with the user. A persistent identifier is needed, but the persistence can be limited. For example, a cookie can provide the user ID. The longer duration the persistence is, the better the recommendations. This is used for items-for-user, items-for-item, and users-for-item recommendations.

  • doc_Id – Document ID. This is used for all types of collaborative recommendations.

Heterogeneous signals are OK. Signals that lack query won’t contribute to items-for-query recommendations. Signals that lack userId won’t contribute to items-for-user, items-for-item, or users-for-item recommendations.

Step 4: Configure Recommender Jobs

The jobs <collection>_item_aggregation and <collection>_item_recommendations have default configurations that produce both items-for-user and items-for-item recommendations as boosts.

Configure jobs in Search > > Home Home > Jobs. Click the job you want to configure. Click Save to save the configuration.

Job <collection>_item_aggregation

You shouldn’t need to configure the <collection>_item_aggregation job.

Job <collection>_item_recommendations

Options for job configuration are:

  • Training Data Sampling Fraction – A value between 0 and 1. Specify a decimal fraction to use that fraction of the data when training. The job will run more quickly and use less memory. Consider reducing this value during testing. For production environments, it should be set to 1.

  • Produce items-for-user recommendations and item-item similarities (produced by default) – Fusion produces item-item similarities when it produces item-item recommendations. You get both or neither. To get both, enter <collection>_items_for_item_recommendations in the Item-to-item Similarity Collection text box. To get neither, delete the collection name <collection>_items_for_user_recommendations from the Items-for-users Recommendation text box.

  • Produce items-for-item recommendations (produced by default) – If you don’t need items-for-item recommendations (or item-item similarities), delete the collection name <collection>_items_for_item_recommendations from the Item-to-item Similarity Collection text box.

  • Produce users-for-item recommendations (not produced by default) – To produce users-for-item recommendations, add the collection name <collection>_users_for_item_recommendations in the Users-for-items Recommendation Collection text box.

  • Compute more or fewer recommendations – The default numbers for items-for-user, items-for-item, and users-for-item recommendations are all 10. Increasing the number doesn’t have a large cost.

  • Recommender Rank: You might need to tune this (make it bigger). There is a memory cost to doing so.

  • Grid Search Width (Advanced tab) – Set to 1 to have Spark estimate the best parameters. After one run with the value set to 1, you can set it back to 0.

  • Implicit Preferences – If checked, Fusion uses user actions to guess user preferences. Uncheck if you have user preference data (for example, ratings).

Step 5: Run a Job to Aggregate Signal Data

Aggregate user-item interaction data to create an aggregation document for each user-item pair. The aggregated data is equivalent to a user-item interaction matrix, which is the input for the ALS Recommender Spark job.

Enabling collaborative recommendations creates the <collection>_item_aggregation job and schedules it to run once a day.

You can also run the <collection>_item_aggregation job manually. Navigate to Devops > Home Home > Scheduler. Click the <collection>_item_aggregation job, and then click Start. Click Cancel to exit scheduling for the job.

Step 6: Run a Job to Create Collaborative Recommendations

Note: By default, this job is scheduled to run automatically upon successful completion of the <collection>_item_aggregation job.

To run the <collection>_item_recommendations job manually, navigate to Devops > Home Home > Scheduler. Click the <collection>_item_recommendations job, and then click Start. Click Cancel to exit scheduling for the job.

This job populates several collections:

  • <collection>_items_for_item_recommendations – N recommended items for each item

  • <collection>_items_for_user_recommendations – N recommended items for each user

  • recommender_models (not produced by default) – Weights for each user-item pair. This collection can contain multiple models for different data collections, which are tagged by model_id. By default, subsequent runs of a collaborative-filtering recommender job reuse a previously generated model, if possible, for better performance.

Step 7: Set Up Query Pipelines

By default, enabling collaborative recommendations sets up two query pipelines that an app can use to obtain recommended items. Alternatively, you can add corresponding query-pipeline stages to other query pipelines.

Use Built-in Query Pipelines for Collaborative Recommendations

When you enable collaborative recommendations, Fusion creates two query pipelines that you can use for retrieving the recommendations:

  • <collection>_items_for_user_recommendations – This pipeline retrieves items to recommend for the user specified by the userId query parameter, based on which items similar users interact with. As its first stage, it has a Recommend Items for User stage. By default, results from that stage are applied as boosts.

  • <collection>_items_for_item_recommendations – This pipeline retrieves items to recommend for the item specified by the docId query parameter, based on the similarity of the other items to the specified item. Similarity is based on user-item interactions, for example, users who clicked on item A also tended to click on item Z. As its first stage, it has a Recommend Items for Item stage. By default, results from this stage are applied as boosts.

Add Collaborative-Recommendation Stages to Other Query Pipelines

As an alternative to using the built-in query pipelines for collaborative recommendations, you can add stages for collaborative recommendations to other query pipelines:

  • Recommend Items for User – Add this stage to recommend items for users, based on what items similar users interact with. By default, results from that stage are applied as boosts.

  • Recommend Items for Item – Add this stage to recommend items based on the similarity of the other items to the specified item. Similarity is based on user-item interactions, for example, users who clicked on item A also tended to click on item Z. By default, results from this stage are applied as boosts.

Include Required Stages

Depending on the source of the query pipeline and the default stages, you might need to add or remove query-pipeline stages. These are examples:

Use Items-for-User and Items-for-Query Recommendations to Boost Items that Match a Query

Use the pipeline stages present in the built-in pipeline <collection>_items_for_user_recommendations – Recommend Items for User, Boost with Signals, Query Fields, Field Facet, and Solr Query.

Use Items-for-Item and Items-for-Query Recommendations to Boost Items that Match a Query

Use the pipeline stages present in the built-in pipeline <collection>_items_for_item_recommendations – Recommend Items for User, Boost with Signals, Query Fields, Field Facet, and Solr Query.

Step 8: Look Up Precomputed Recommendations at Query Time

Get collaborative recommendations for a user at query time. Query the pipelines that you set up in Step 7.