Step by Step
- Step 1: Create a Collection to Hold the Data
- Step 2: Enable Recommendations
- Step 3: Collect User-Item Interaction Data
- Step 4: Configure Recommender Jobs
- Step 5: Run a Job to Aggregate Signal Data
- Step 6: Run a Job to Create Collaborative Recommendations
- Step 7: Set Up Query Pipelines
- Use Built-in Query Pipelines for Collaborative Recommendations
- Add Collaborative-Recommendation Stages to Other Query Pipelines
- Include Required Stages
- Step 8: Look Up Precomputed Recommendations at Query Time
Fusion can leverage user behavior to generate recommendations and boosts based on the behavior of similar users. This is called collaborative filtering. The default objects and schedules that are created when you enable Spark-based collaborative recommendations can produce these types of recommendations:
Items for User – Recommend items based on which items similar users have interacted with.
Items for Item – Recommend items based on the similarity of the other items to the specified item. Similarity is based on user-item interactions.
Users for Item – Recommend users for an item, for example to target advertising.
Here we give a step-by-step walkthrough of how to implement Items for User and Items for Item recommendations.
Step 1: Create a Collection to Hold the Data
Create a collection to contain the input data (signals about user-item interactions) and the data that Fusion generates to provide collaborative recommendations.
Click Devops > Home > Collections > New. ``. Name the collection and click Save Collection.
Step 2: Enable Recommendations
Create the objects and schedules that Fusion needs to process the data for items-for-user, items-for-item, and users-for-item collaborative recommendations.
Enable Recommendations in the Fusion UI enables Spark-based collaborative recommendations, that is, items-for-user and items-for-item recommendations (and users-for-item recommendations if you enable those). You don’t need to give this command to set up the recommendations infrastructure for items-for-query collaborative recommendations (Boost with Signals) or for content-based filtering.
To enable collaborative recommendations:
Log in to Fusion as the admin user.
From the Collections drop-down list, choose the collection for which you want to enable collaborative recommendations.
Click Settings > Enable Recommendations.
Enabling recommendations creates these objects:
Collection to hold user interactions. If the signals feature is enabled for this collection (which it is by default), then this collection will already exist.
Collection to hold aggregated user interactions. If the signals feature is enabled for this collection (which it is by default), then this will already exist.
The default aggregation groups together all signals of type click by the joint key
(user_id_s, doc_id_s)from the
<collection>_signalscollection, and performs a time-decayed count of these results to generate an estimate of the "implicit preference" that
Collection to hold generated item-item similarities (by default 10 per item). No
user_id_sdata is present. A Recommend Items for Item query pipeline stage can use the similarities to return item recommendations. For example, a query in which
doc_id_s = docAwould return an ordered list of other
doc_id_svalues for documents that are similar to document
docA, along with the similarities. For example:
[("docB", 0.83), ("docC", 0.55), ("docD", 0.43), …, ("docK", 0.22)].
Collection to hold recommended items for a user. By default the job creates 10 recommendations per user.
Aggregates user-item interactions to generate weights. This aggregates user-item interactions and calculates a weight based on the recency of the interaction (so that more recent interactions have more impact on recommendations).
Runs a Spark job to train an ALS-based (Alternating Least Squares) recommendation model, and then uses that model to generate recommendations. By default, this job generates both items-for-user recommendations and items-for-item recommendations. The job stores the results in the
Schedule for the job
Runs the aggregation job once a day.
Schedule for the job
Runs the recommender job after successful completion of the aggregation job.
Query pipeline to generate recommendations of items for a user
Query pipeline to generate recommendations of items similar to an item
Step 3: Collect User-Item Interaction Data
Both items-for-user and items-for-item recommendations are based on user-item interactions, that is, on some interactions that users have with items. For example, users might look at the items, click on the items, or buy the items. User-item interactions often reflect user-item preferences.
You app must collect user-item interaction data and use signals to get the data to Fusion. Data collection is an ongoing process. The more data, the better; and current data provides the best recommendations.
If you have collected user-item interaction data previously, you can load that data into Fusion. This is one way to address the cold-start problem—without user-item interaction data, the algorithms lack the data they need to make recommendations.
Use signals to report user activity to Fusion. Signals contain data of interest. Signals generated from user-item interactions contain a
type field that identifies the type of interaction, for example, a click or a purchase. They also contain these
query– Either a search query or a context-query. This is used for items-for-query (Boost with Signals) recommendations, and might be used for items-for-user, items-for-item, and users-for-item recommendations.
userId– A user ID. This can be anything that is associated with the user. A persistent identifier is needed, but the persistence can be limited. For example, a cookie can provide the user ID. The longer duration the persistence is, the better the recommendations. This is used for items-for-user, items-for-item, and users-for-item recommendations.
doc_Id– Document ID. This is used for all types of collaborative recommendations.
Heterogeneous signals are OK. Signals that lack
query won’t contribute to items-for-query recommendations. Signals that lack
userId won’t contribute to items-for-user, items-for-item, or users-for-item recommendations.
Step 4: Configure Recommender Jobs
<collection>_item_recommendations have default configurations that produce both items-for-user and items-for-item recommendations as boosts.
Configure jobs in Search > > Home > Jobs. Click the job you want to configure. Click Save to save the configuration.
You shouldn’t need to configure the
Options for job configuration are:
Training Data Sampling Fraction – A value between 0 and 1. Specify a decimal fraction to use that fraction of the data when training. The job will run more quickly and use less memory. Consider reducing this value during testing. For production environments, it should be set to 1.
Produce items-for-user recommendations and item-item similarities (produced by default) – Fusion produces item-item similarities when it produces item-item recommendations. You get both or neither. To get both, enter
<collection>_items_for_item_recommendationsin the Item-to-item Similarity Collection text box. To get neither, delete the collection name
<collection>_items_for_user_recommendationsfrom the Items-for-users Recommendation text box.
Produce items-for-item recommendations (produced by default) – If you don’t need items-for-item recommendations (or item-item similarities), delete the collection name
<collection>_items_for_item_recommendationsfrom the Item-to-item Similarity Collection text box.
Produce users-for-item recommendations (not produced by default) – To produce users-for-item recommendations, add the collection name
<collection>_users_for_item_recommendationsin the Users-for-items Recommendation Collection text box.
Compute more or fewer recommendations – The default numbers for items-for-user, items-for-item, and users-for-item recommendations are all 10. Increasing the number doesn’t have a large cost.
Recommender Rank: You might need to tune this (make it bigger). There is a memory cost to doing so.
Grid Search Width (Advanced tab) – Set to 1 to have Spark estimate the best parameters. After one run with the value set to 1, you can set it back to 0.
Implicit Preferences – If checked, Fusion uses user actions to guess user preferences. Uncheck if you have user preference data (for example, ratings).
Step 5: Run a Job to Aggregate Signal Data
Aggregate user-item interaction data to create an aggregation document for each user-item pair. The aggregated data is equivalent to a user-item interaction matrix, which is the input for the ALS Recommender Spark job.
Enabling collaborative recommendations creates the
<collection>_item_aggregation job and schedules it to run once a day.
You can also run the
<collection>_item_aggregation job manually. Navigate to Devops > Home > Scheduler. Click the
<collection>_item_aggregation job, and then click Start. Click Cancel to exit scheduling for the job.
Step 6: Run a Job to Create Collaborative Recommendations
Note: By default, this job is scheduled to run automatically upon successful completion of the
To run the
<collection>_item_recommendations job manually, navigate to Devops > Home > Scheduler. Click the
<collection>_item_recommendations job, and then click Start. Click Cancel to exit scheduling for the job.
This job populates several collections:
<collection>_items_for_item_recommendations– N recommended items for each item
<collection>_items_for_user_recommendations– N recommended items for each user
recommender_models(not produced by default) – Weights for each user-item pair. This collection can contain multiple models for different data collections, which are tagged by
model_id. By default, subsequent runs of a collaborative-filtering recommender job reuse a previously generated model, if possible, for better performance.
Step 7: Set Up Query Pipelines
By default, enabling collaborative recommendations sets up two query pipelines that an app can use to obtain recommended items. Alternatively, you can add corresponding query pipeline stages to other query pipelines.
Use Built-in Query Pipelines for Collaborative Recommendations
When you enable collaborative recommendations, Fusion creates two query pipelines that you can use for retrieving the recommendations:
<collection>_items_for_user_recommendations– This pipeline retrieves items to recommend for the user specified by the
userIdquery parameter, based on which items similar users interact with. As its first stage, it has a Recommend Items for User stage. By default, results from that stage are applied as boosts.
<collection>_items_for_item_recommendations– This pipeline retrieves items to recommend for the item specified by the
docIdquery parameter, based on the similarity of the other items to the specified item. Similarity is based on user-item interactions, for example, users who clicked on item A also tended to click on item Z. As its first stage, it has a Recommend Items for Item stage. By default, results from this stage are applied as boosts.
Add Collaborative-Recommendation Stages to Other Query Pipelines
As an alternative to using the built-in query pipelines for collaborative recommendations, you can add stages for collaborative recommendations to other query pipelines:
Recommend Items for User – Add this stage to recommend items for users, based on what items similar users interact with. By default, results from that stage are applied as boosts.
Recommend Items for Item – Add this stage to recommend items based on the similarity of the other items to the specified item. Similarity is based on user-item interactions, for example, users who clicked on item A also tended to click on item Z. By default, results from this stage are applied as boosts.
Include Required Stages
Depending on the source of the query pipeline and the default stages, you might need to add or remove query pipeline stages. These are examples:
Use Items-for-User and Items-for-Query Recommendations to Boost Items that Match a Query
Use the pipeline stages present in the built-in pipeline
<collection>_items_for_user_recommendations – Recommend Items for User, Boost with Signals, Query Fields, Field Facet, and Solr Query.
Use Items-for-Item and Items-for-Query Recommendations to Boost Items that Match a Query
Use the pipeline stages present in the built-in pipeline
<collection>_items_for_item_recommendations – Recommend Items for User, Boost with Signals, Query Fields, Field Facet, and Solr Query.
Step 8: Look Up Precomputed Recommendations at Query Time
Get collaborative recommendations for a user at query time. Query the pipelines that you set up in Step 7.