https://EXAMPLE_COMPANY.b.lucidworks.cloud
.
EXAMPLE_COMPANY
with the name provided by your Lucidworks representative.Quickstart
Create a Managed Fusion application
Movie Search
.
App to search for movies
.
Index Data
ml-latest-small.zip
file.
Managed Fusion can parse .zip
files, but in this tutorial, we will index just one file from the archive (movies.csv
).
The movies.csv
file contains a list of 9,125 movie titles, plus a header row. Here is a truncated listing:
Movie_Search
is selected as the default collection for the Movie Search app, and is where Managed Fusion will place index data.
movies.csv
file on your computer, select it, and click Open. The file name displays on the screen.
movies_csv-Movie_Search
and the default file ID movies.csv
. You do not have to change these values.Movies CSV file
.
movies.csv
file, and then displays a preview of how they would be indexed based on current parameter and field settings.genres
became genres_t
(the text_general
field type) and genres_s
(the string
field type). String fields are useful for faceting and sorting, while text fields are for full-text search. At this point, Managed Fusion cannot determine whether you intend to use this field for faceting and sorting, for full-text search, or for both.title
became title_t
and title_s
because Managed Fusion cannot determine whether you intend to use this field for faceting and sorting, for full-text search, or for both.movieId
became movieId_t
and movieId_s
because Managed Fusion cannot determine whether you intend to use this field for faceting and sorting, for full-text search, or for both. This might seem odd, because the original field contains numbers. But, at this stage, Managed Fusion creates text_general
and string
fields. To use the contents of this field as an integer, you would map the field to an integer field._lw
fields contain data that Managed Fusion creates for its own housekeeping. You can disregard these entries.genres
, movieId
, and title
.genres
.genres_ss
.
The field suffix _ss
means that this field is a multi-valued string field.
genres_ss
instead of genres
.
genres_ss
.
Before:
movieId
field is a unique document identifier. Select to copy it into the document’s id
field.title
should be searchable as a text field, so select to move it to the title_txt
field.
The field mappings display as:
genres_ss
field has been parsed as a single value field, but it is really a pipe-delimited array of values. To split this field into its constituent values, add a Regex Field Extraction stage to your index pipeline. This stage uses regular expressions to extract data from specific fields. It can append or overwrite existing fields with the extracted data, or use the data to populate new fields.<img className="inline-image" alt="...]
under Source Fields, and click Edit [Edit, 30px” src=“/assets/images/4.0/icons/edit-icon.png”/>.genres_ss
and click Apply.genres_ss
.input_string
.genres_ss
field:
Before:
genres_ss
field, click the right triangle values
under it:
title_txt
field also contains the year in which the movie was released. To make the field more useful for faceting, the year needs to be a separate field. The Regex Field Extraction stage will separate the data.<img className="inline-image" alt="...]
under Source Fields, and then click Edit [Edit, 30px” src=“/assets/images/4.0/icons/edit-icon.png”/>.title_txt
and click Apply.year_i
.
The _i
suffix indicates an integer point field (specifically, that the field is a dynamic field with a point integer, pint
, field type). Managed Fusion creates this new field when the regular expression matches the contents of the source field.
title_txt
value:
1
. This lets the index pipeline stage transfer the year into the year_i
field.
year_i
field:
Before:
title_txt
field still includes the year of the film’s release, which you have extracted into its own field, year_i
. To refine the field for faceting, trim year_i
from the title_txt
values so only the title text remains.title_txt
and click Apply.title_txt
.overwrite
.title_txt
value:
1
.title_txt
field with only the title string:
Before:
movies.csv
file using the configuration you just saved.Your datasource job is finished when the Index Workbench displays Status: success
in the upper left. If the status does not change, click to return to the launcher and relaunch your app to refresh the status._lw_data_source_s
field.For various reasons, you may wish to remove all documents associated with a datasource from a collection before using CrawlDB to add relevant documents back to the collection. This process is known as reindexing._lw_data_source_s
field.Query Data
genres_ss
. A list of one or more genre labels.title_txt
. The name of the movie.year_i
. The movie’s year of release._ss
(multivalued string fields) contain one or more string values.
String fields require an exact match between the query string and the
string value stored in that field._txt
(text fields) contain text.
Text fields allow for free text search of the field contents.
For example, because the movie titles are stored in a text field, a search on
the word “Star” will match movies titled “Star”, “A Star is Born”, all movies in the
Star Wars and Star Trek franchises, as well as “Dark Star”, “Lone Star”, and “Star Kid”._i
(point integer fields) contain integer values.
Numeric fields allow range matches as well as exact matches, and point integer fields allow efficient comparisons between the field’s values and the search criteria.*:*
), which returns all documents in the collection.For information about other search entries for facets, see search entry options.star
, and press Enter or click Search movie-search
.
To view more of the default output, you can perform other searches.*:*
to return all documents, and press Enter or click Search genres_ss
and year_i
. For example, a user could search for science fiction of the 1950s in just a few clicks.genres_ss
field.
Sci-Fi
for genres_ss
:
index
(alphabetical ascending order) or count
(number of documents). You can also add field facets by configuring the Field Facet stage.year_i
field as you did above for the genres_ss
field, you would get one facet per year, which is not very useful.The year_i
field will be more usable if you configure range faceting. Range faceting is a way of grouping values together so that the user can select a value range instead of one specific value. For example, range facets are commonly used with pricing (100) or ratings (4 stars or higher). In this example, you will group years by decade.Range faceting requires sending an additional query parameter to Managed Fusion’s Solr core. You can configure this with the Solr’s range facet query parameters.Use the Additional Query Parameters stage to configure range faceting for the year_i
field:facet.range
: year_i
facet.range.start
: 1900
facet.range.end
: 2020
facet.range.gap
: 10
facet.range.include
: outer
In this case, you do not need to modify the Update Policy field. The default value of append
does not affect these results.year_i
field such as using the text field or dropdown list.id
field. This field may not be useful to your users.
Use the Query Fields stage to specify the fields that may be a higher priority for users.title_txt
.year_i
.Improve Relevancy
title_txt
. You can filter the list of possible values.id
.star wars
.
The top results are not your favorite titles:
star wars
.Movie_Search_signals
).COLLECTION_NAME_signals
for raw signals. For example, Movie_Search_signals
.COLLECTION_NAME_signals_aggr
for aggregated signals. For example, Movie_Search_signals_aggr
._signals
collection.type:click
and click Search count_i
field displays the number of click signals you generated for this event. For example, given the corresponding doc_id
for Star Wars: Episode IV - A New Hope, the count_i
equals 4000.
doc_id
.
count_i
.
Movie_Search_click_signals_aggregation
.doc_id
.aggr_count_i
.1210
, click show fields.
aggr_count_i
. Number of signals that have been aggregated. For example, 3000.aggr_id_s
. Name of the aggregation job.aggr_job_id_s
. Job ID.aggr_type_s
. Aggregation type.Movie_Search
.star wars
.
“Star Wars: Episode IV - A New Hope” is the first search result, followed by Episode V and then VI. These search results are automatically boosted by the default configuration of the
Boost with Signals query pipeline stage,
which boosts on the id
field.
Movie_Search
) with and without the Boost with Signals stage enabled.