Product Selector

Fusion 5.9
    Fusion 5.9

    Indexing Data Flow

    With Fusion, your data is indexed in a set of collections in Fusion’s Solr core.

    A primary collection holds your searchable content, such as your product catalog, knowledge base, blog articles, product reviews, and so on.

    A set of secondary collections are associated with your primary collection to hold related data that Fusion can use to enhance the relevancy of your search results.

    This topic shows you how different types of data flow through Fusion to be indexed in your primary and secondary collections.

    1. Index your content

    No matter where your content is located, Fusion can index it. The Fusion collection where your searchable content is indexed is called the primary collection. Secondary collections are automatically created to hold related data, such as signals, machine learning job output, and so on.

    There are a few ways to index your searchable content to the primary collection:

    • Connectors

      Lucidworks has a wide variety of connectors for many types of data sources. Find your connector.

      Once the connector fetches your data, parsers read it before passing it to the index pipeline.

    • The Parallel Bulk Loader (PBL)

      The PBL can send your data to an index pipeline or directly to the primary collection, depending on whether the data requires transformation before indexing.

    • Index profiles

      Send your content to an index profile using the Fusion REST API.

    Indexing your content

    Index your content

    The index pipeline consists of one or more configurable index pipeline stages, each performing a different type of transformation on the incoming data. Each connector has a default index pipeline, but you can modify these or create new ones.

    The last stage in any index pipeline should be the Solr Indexer stage, which submits the documents to Solr for indexing.

    2. Index your signals

    Signals are event records that provide historical data about user behavior, such as clicks, likes, purchases, and so on. You don’t need to index signals about query responses; Fusion indexes response signals automatically by default. If you are using App Studio or App Insights, then you need to index request signals. Learn more about signal types and required fields.

    To index your signals, you send them to Fusion using the Signals API, which points to the hidden index pipeline designed especially for signals.

    Indexing your signals

    Index your signals

    Raw signals are indexed in a secondary collection called COLLECTION_NAME_signals. For example, if your primary collection is called Products, then the raw signals collection is Products_signals.

    3. Fusion jobs that index signals-related data

    When you enable signals, Fusion creates jobs and secondary collections for analyzing and aggregating your raw signals. Some of this data enables query rewriting and automatic boosting, while other data becomes useful when you enable recommendations.

    Fusion jobs that index signals-related data

    Enable signals jobs

    4. Fusion jobs that index recommendations

    When you enable recommendations, another set of jobs and secondary collections is created.

    Fusion jobs that index recommendations

    Enable recommendations jobs

    The default recommendation jobs read data from different collections depending on the type of recommendations being generated:

    For more recommendations, try configuring the Trending Recommender job.

    What’s next?

    See Query Data Flow to learn about querying your content and recommendations.