Getting Started with Fusion:
Part Three - Better Relevancy

Signals are events that can be aggregated and used for automatic boosting or recommendations, ways of adding superior relevancy to search results.

For example, click events or purchase events can be collected as signals and used to display "Customers who viewed this also viewed…​" or "Best-selling holiday items". Similarly, the most popular search results for certain queries can be boosted so that they appear first when other users make similar queries.

In a production environment, signals are generated through the natural actions of end users. For the purposes of these tutorials, we’ll generate click signals using the Query Workbench.

Before you begin

To proceed with this part of the tutorial, you must first complete Part 1 and Part 2, which gives you an indexed dataset that’s configured for faceted search.

1. Enable signals

First, we need some signals data. Since this is a prototype, we don’t have end users naturally generating signals. Instead, we’ll enable synthetic signals in the Query Workbench.

  1. Log in to Fusion, click Search, and make sure the ml-movies collection is selected.

  2. Navigate to Home > Query Workbench.

  3. At the bottom of the panel, click Format Results

  4. Select Show signal generators and Send click signals.

    QWB enablesignals

  5. Click Save.

  6. Hover over one of the search results.

    Now when you hover over a search result, the Query Workbench displays controls that include a Simulate button next to a field that configures the number of signals to simulate:

    Simulate button

2. Generate signals

With synthetic signals enabled, we’ll generate a simple set of signal data that we can use to generate meaningful recommendations.

For this tutorial, we’ll generate signals that we can use to boost our favorite sci-fi titles so that they appear first.

  1. Search for "star wars".

    The top results are not our favorite titles:

    Star Wars search results

    Next we’ll generate signals that we can use to boost certain titles. Signals are tied to the search query, so our boosted titles will appear first in the search results only when users search for "star wars".

  2. Hover over "Star Wars: Episode IV - A New Hope".

  3. Set the number of signals to 4000.

  4. Click Simulate.

  5. Hover over "Star Wars: Episode V - The Empire Strikes Back".

  6. Set the number of signals to 3000.

  7. Click Simulate.

  8. Hover over "Star Wars: Episode VI - Return of the Jedi".

  9. Set the number of signals to 2000.

  10. Click Simulate.

  11. Click Save and overwrite the ml-movies-default pipeline.

3. Inspect the raw signals and the aggregated signal data

Our signal data is stored in the ml-movies_signals collection. Whenever you create a collection, two corresponding collections are also created automatically: _signals for raw signals and _signals_aggr for aggregated signals.

As of Fusion 3.0, aggregation jobs are enabled automatically. They run every two minutes by default to aggregate any new raw signals.

  1. In the collections list in the upper left, select ml-movies_signals.

  2. In the Home panel, click Query Workbench.

    Our signal data appears. Just as we did with our primary collection, we can use the Query Workbench to explore the raw data in the _signals collection.

  3. From the Choose sort field menu, select Count.

    Now our signal data is sorted by signal count, in descending order.

  4. Next to any of the results, click show fields.

    • QWB ml movies 10The count_i field shows the number of click signals we generated for this event.

    • The doc_id_s field is the same as the id field in our ml-movies_signals collection, that is, the ID of the document that we clicked.

    • query_orig_s is the original query string that produced this search result.

    It takes up to two minutes for Fusion to run the next aggregation job. Let’s see whether the aggregated data has arrived in the ml-movies_signals_aggr collection.

  5. In the collections list in the upper left, select ml-movies_signals_aggr.

  6. In the Home panel, click Query Workbench.

    Your aggregated signal data should appear. If not, wait a minute and then reload your browser or click the search button in the Query Workbench.

  7. From the Choose sort field menu, select Decay count.

    In production, the decay count is the aggregated equivalent of the signal count with a time decay applied (so older signals have a lower count). In this case, of course, they’re all very new. In our case, all of our signals are very new, so no time decay is applied.

  8. Next to any of the results, click show fields.

    QWB ml movies 11The fields are very similar to the raw signal data fields, with additional fields to describe the aggregation:

    • aggr_count_i counts the number of times this signal has been aggregated (in this case, just 1).

    • aggr_id_s is the name of the aggregation job.

    • aggr_job_id_s is the job ID.

    • aggr_type_s specifies the signal type and the fields on which it was aggregated.

4. View the search results with default boosting

  1. Return to the ml-movies collection.

  2. Navigate to the Query Workbench.

  3. Search for "star wars".

    Now "Star Wars: Episode IV - A New Hope" is the first search result. It’s automatically boosted by the default configuration of the Boost with Signals query pipeline stage, which boosts on the id field.

    First search result

  4. Click Compare.

    Another preview panel opens. In this view, you can compare results from one query pipeline side by side with another query pipeline.

  5. In the left panel, select the "_system" pipeline.

    Select _system pipeline

    Because the _system pipeline has no Boost with Signals stage, our favorite titles are not boosted in the left panel:

    Favorites boosted

    In Part 2, we saw that enabling and disabling pipeline stages is one way of understanding how our data is being transformed by the query pipeline. Comparing two pipelines is another.

  6. Turn off the Boost with Signals stage.

    Now the search results appear the same in both preview panels.

  7. Turn on the Boost with Signals stage again to restore the boosted results.

  8. Close the comparison preview panel by clicking the close QWB close icon.

What’s next

In Part 4, we’ll install Lucidworks View, learn how to connect a stand-alone search application to Fusion, and customize the interface.