Tutorial One:
Part Three - Better Search

Before you begin, be sure to complete Part One and Part Two.

  • In Part One, we created the collection "cinema_1", which contains 155,298 short abstracts from the English Wikipedia for entries related to movies and films.

  • In Part Two, we modified the default index pipeline to create richer documents.

In Part Three, we’ll modify the default query pipeline to exploit our enriched documents by manipulating the relative weights of their fields. The overall relevancy ranking for a document is derived from the per-field search scores. We’ll use Fusion to modify the formula used to combine these constituent per-field scores in order to up- or down-weight the contribution of a per-field score to the overall result.

Working With A Query Pipeline

A search form is the usual interface between your search application and Fusion. The inputs from the search form are submitted to a Fusion query pipeline in the form of an HTTP request which returns a payload containing a structured search result.

Like an index pipeline, a query pipeline is composed of processing stages. The stages in a query pipeline transform a set of inputs into a Solr query that runs against the Solr collection and returns the result.

  1. Navigate to Home > Query Pipelines.

  2. Click the default query pipeline for this collection, "cinema_1-default".

    This opens the Query Pipeline configuration panel, showing the initial configuration for "cinema_1-default":

    default query pipeline stages

A default query pipeline consists of three stages:

  • A Query Fields query stage defines common Solr query parameters.

  • A Facets query stage contains no specified facet fields by default.

  • A Solr Query stage sends the fully-configured query request to Solr.

We’ll explore all three of these stages.

Solr Fields query stage

The Solr Fields query stage is used to specify which fields are used for search and which fields are returned as part of a search results document:

search fields stage

In order to run free-text search queries over a field, that field must be indexed as a text field. The Fusion field naming convention uses the suffix "_txt" to indicate that a field should be indexed as a text field.

Collection "cinema_1" has two text fields available for free-text search:

  • shortAbstract_txt

  • title_txt

We’ll start by configuring the Solr Fields query stage to specify that the shortAbstract_txt and title_txt fields should be used for search.

  1. In the Query Pipeline configuration panel, click Query Fields.

  2. Under Query Fields, click the green add (+) button and add "shortAbstract_txt".

    For now, leave the Field Boost unspecified, for this field and the next one.

  3. Click the green add (+) button again and add "title_txt".

  4. Under Return Fields, add the following fields:

    • DBpediaURL_s

    • WikipediaURL_s

    • title_txt

    • abstractShort_txt

    • id

    • _version_

cofig return fields

For the return fields we specify all input fields and the title field, plus two identifying fields:

  • The default ID created by the CSV processor

  • Solr’s internal "version" id

Facet query stage

Faceting provides an open-ended way of slicing and dicing a set of search results based on category information available from certain fields in the documents in the results set. Any field which encodes information about item attributes, such as type, category, location, price, size, shape, date, and so on, can be used for faceting. Because this powerful feature is commonly used, it’s included as part of the default pipeline associated with every collection.

However, the dataset for this tutorial doesn’t have any fields which contain this kind of information, so instead of configuring the facet field stage, we’ll disable it for faster query processing.

  1. Click the Facets stage.

  2. Click the Skip This Stage checkbox.

  3. Click the Save button.

skip facet stage

Solr query stage

This stage submits a query to Solr. No special configuration is needed for this stage:

solr query stage

Search Tuning

To see how this pipeline works, we return to the Search UI.

  1. In the Query Pipelines configuration panel, click the plus (+) icon in the upper right.

    The Home panel appears on the right.

  2. In the Home panel, click Search.

    The Search panel opens next to the Query Pipelines panel. Now we can work with both panels.

  3. Search for "Star Wars":

    search fields stage

    Our search works as expected. Later we’ll compare these results with the ones we get after search tuning.

    Next we’ll test our query pipeline with a longer, more open-ended free-text search.

  4. Search for "film starring Matt Damon".

    film starring matt damon no boost

    Our results include titles with "Matt" and "Damon".

  5. In the Query Pipelines panel, under Query Fields, give abstractShort_txt a Field Boost value of "2" and click Save.

    film starring matt damon boost abstract 2

    Our search results improve a bit. Let’s see what happens when we boost that field again.

  6. Give abstractShort_txt a Field Boost value of "3" and click Save.

    film starring matt damon boost abstract 3

    Even better.

  7. Now let’s try the "Star Wars" search again:

    retest star wars boost abstract 3

    Compare these results to the ones we got before search tuning.

In order to embed search in an application, we need the query URL.

  1. In the Search panel, click the gear icon to open the configuration window.

  2. Select Display Query URL.

    configure search UI for search URL

  3. Click Save.

    search URL

    Now the search URL is displayed near the top of the panel. This is what you will embed in your app.

    However, the URL displayed here includes extra parameters that are required by the controls on the Search UI. For example:

    http://localhost:8764/api/apollo/query-pipelines/cinema_1-default/collections/cinema_1/select?fl=%2A%2Cscore&echoParams=all&wt=json&json.nl=arrarr&sort&start=0&q=star+wars&debug=true&rows=10
  4. Copy the URL and paste it into another browser tab. Remove all parameters except q=star+wars, like this:

    http://localhost:8764/api/apollo/query-pipelines/cinema_1-default/collections/cinema_1/select?&q=star+wars

    search URL results wt = XML

    Our results are in XML format because this is the default writer type, controlled by the wt parameter.

  5. Add wt=json to the query URL and load the results again:

    http://localhost:8764/api/apollo/query-pipelines/cinema_1-default/collections/cinema_1/select?&q=star+wars&wt=json

    search URL results wt = json

Your application can request search results in an appropriate format. See the complete list of values for the wt parameter.

Lessons Learned

There are simple, principled ways to modify field boost based on what we already know about fields and their contents.

We can use the search query itself to improve search results.

For basic searching, use fields and field boosting to provide the best results, back-off when users are unsure.

Richer data makes for richer search experiences.