Use this job when you want to compute item similarities based on their content, such as product descriptions.
Default job nameCOLLECTION_NAME_content_recs
InputSearchable content from the primary collection.
OutputItems-for-item recommendations (the COLLECTION_NAME_content_recs collection by default)
First, item content is vectorized; different vectorization methods are available. Then, similar items are selected based on cosine similarity (“nearest neighbor”) between their vectors. At a minimum, you must specify these:
  • An ID for this job
  • The name of the training collection, that is, the collection with your content
  • An output collection; create a separate collection for this
  • The name of the ID field for documents in the training collection, such as item_id_s
  • The names of one or more content fields in the training collection
Content-based recommendations dataflow

Content-based recommendations dataflow

Tuning tips

  • Configure Metadata fields for item-item evaluation to use those fields during evaluation to determine whether pairs belong to the same category.
  • Perform approximate nearest neighbor search is enabled by default to significantly reduce the job’s running time, with a small decrease in accuracy. If your training dataset is very small, then you can disable this option.
  • If your content contains a lot of domain-specific jargon, enable Use Word2Vec for vectorization.
  • If your documents are too short or too long, enable Use TF-IDF for vectorization.

Query pipeline setup

Download the APPName_item_item_rec_pipelines_content.json file and Migrate Fusion Objects to create the query pipeline that consumes this job’s output. See Fetch Items-for-Item Recommendations (Content-Based Method) for details.

Configuration properties