Collection Management in the Fusion UI

Collections can be created or removed using the Fusion UI or the REST API. For details of using the REST API to manage collections, see the section Collections API.

Creating a Collection

To create a collection, use the Add Collection control.

Enter a Collection name. This name cannot be changed later. If you are fine with defaults (i.e., the collection will be created within the default Solr cluster, etc.), you can simply click Add.

Advanced Options

Toggle the Advanced button to "ON" to select additional options and change the defaults. This section is split into 3 sections.

Solr Cluster

The collection will be associated with the Solr instance that is associated with the "default" Solr cluster. If Fusion has multiple Solr clusters, you must choose the cluster from the pull-down list of available clusters. The cluster must exist first, which is described in the section Solr cluster.

Solr Cluster Layout

The next section allows you to define the appropriate Replication Factor and Number of Shards. You only need to define these options if you are creating a new collection in the Solr cluster. If you are linking Fusion to an existing Solr collection, you can skip these settings.

Solr Collection Import

The final section allows you to associate the new Fusion collection with an existing Solr collection. Choose Solr Collection Name to associate the collection with an existing Solr collection. Then, you can choose Solr Config Set to tell ZooKeeper to use the configurations from an existing collection in Solr when creating this collection.

Configuring Collections

The collection home screen provides access to all of the configuration options for your collection, including collection features, datasources, profiles, stopwords, synonyms and some status reports.

Navigation to the collection configuration items is achieved with a tab bar displayed under the collection name.

Home

The home page provides some quick access boxes to update query or index pipelines, create or update datasources, and manage query and index profiles.

It also displays details about the collection, such as when it was last updated, how many datasources are configured, how many documents are in the index and the how much disk space the index consumes. This section also provides access to a 'Hard Commit' button, which allows you to issue a commit command to Solr.

A search box is available to quickly search documents in the collection. While you can launch the search from this screen, results will be shown in the Search module.

When you first create a collection, the searchLogs and signals features will be enabled by default. Note that if you disable the searchLogs feature, you will not see any data in the Reports section.

Datasources

Next to the Home tab is the Datasources tab. By default, there are no datasources configured right after installation.

Click Add datasource to add a new datasource to the system. The Connectors and Datasources Reference has more details on how to configure a datasource, as options will vary depending on the repository you would like to crawl.

Once you have configured a datasource, it will appear in a list on this screen. Click the name of a datasource to edit its properties. Click Start to start the datasource, and once started click Stop to stop the datasource before it completes. To the right, you will see information on the last completed job, including the date and time started and stopped, and the number of documents found as new, skipped, or failed.

Note:

When a datasource is stopped, Fusion will attempt to safely close connector threads, finishing processing documents through the pipeline and index documents to Solr. Some connectors take longer to complete these processes than others, so may stay in a 'stopping' state for several minutes.

If you need a datasource to stop immediately, choose 'abort' instead of 'stop'.

There is also a REST API for datasources; see the section Connector Datasources API for details.

Profiles

Profiles allow creating aliases for index and query pipelines. The pipelines are inherently global (i.e., they can be used with any collection), but the profiles point to a pipeline. This allows you to always send documents or queries to a consistent endpoint and change the underlying pipeline as needed.

The Profiles tab shows the index profiles on the left and the query profiles on the right. Hover over the name of a profile, and you will see an edit button allowing you to change the pipeline the profile is mapped to. Hover over the name of a pipeline, and you will be able to jump to edit that pipeline.

Click Add Profile to add a profile of either type. The next screen will show a form allowing you to define the profile name and either select an existing pipeline or create a new pipeline with the name you choose. Click Create to save the new profile.

Stopwords

The Stopwords tab allows editing a stop words list for your collection.

You can click on a letter and filter the list of stop words by that letter, or enter a word into the 'Filter by word' box. Additionally, you display the list of stop words as a List (vertically down the screen) or as a Grid (in cells across and down the screen).

To import a stop words list, click the Import button. A dialog box will open to allow you to choose the file to import.

To edit the list of stop words, you must enable edit mode by clicking the Edit button. Once edit mode is enabled, you can enter stop words into the text box (separated by spaces) and then save the new terms by clicking the Add stop words button. Once you are finished with edits, you should Save changes.

If you need to, you can Delete all stop words. You can remove single stop words by clicking the 'x' icons next to each word. After removing terms, you should click Save changes to save your changes.

Synonyms

From the left menu, under 'Stop words', you can edit a synonyms list for your collection. Lucidworks has implemented the same synonym functionality that is supported by Solr, which includes support for a list of words that are synonyms (where the synonym list expands on the terms entered by the user), or a full mapping of words where a word will be substituted for what the user has entered (where the term the user has entered is replaced by a term in the synonym list). See the Apache Solr Reference Guide section on the Synonym Filter for more details.

With the Fusion UI, you can filter the list of synonym definitions by entering a word into the 'Filter by word' box. You can also click a letter to filter the list by terms that begin with the selected letter.

To export the synonyms list, click the Export button. This will download the list via your browser download capability to your hard drive.

To import a synonyms list, click the Import button. A dialog box will open to allow you to choose the file to import.

To edit the list of synonyms, you must enable edit mode by clicking Edit button. Once edit mode has been enabled, you can enter new synonym definitions one per line and then save the new lines by clicking the the Add synonyms button.

  • To enter a string of terms that will expand on the terms the user entered, enter the terms separated by commas, like Television, TV.

  • To enter a term that should be mapped to another term, enter the terms separated by an equal sign then a right bracket, '⇒', like i-pod⇒ipod.

Once you are finished with edits, you should Save changes.

If you need to, you can Delete all. You can remove individual lines of synonym definitions by clicking the 'x' icons next to each word. After removing terms, you should click Save changes to save your changes.

Reports

The Reports tab provides charts to show some of the analytics for a specific collection.

The reports require that the searchLogs feature has been enabled for a collection and are visual representations of the same data that can be retrieved with the Reporting API. The reports shown are:

  • Top queries: the queries that have been performed most often. This is based on the 'topQueries' report.

  • Top clicked: the items that have been clicked most frequently. This requires that user click events have been stored in the system, and that they have been aggregated to get click boost data. This is based on the 'topClicked' report.

  • Slow queries: shows queries that have been statistically slower than others. This report is based on the 'histo' report, which is a histogram of query times.

  • Zero results queries: shows queries that have returned 0 results. This report is based on the 'lessThanN' report.

  • Query rate (last 10 minutes): shows the query rate over the last 10 minutes. This report is based on the 'dateHisto' report.

Solr Config

The Solr Config tab provides access to Solr’s configuration files stored in ZooKeeper. This allows modifying solrconfig.xml, schema.xml, elevate.xml, and other configuration files that may need to be changed.

The left side of the screen shows the tree of files in the '/config' node of ZooKeeper. If you click on a file name, it will appear in the right. If there are child nodes, those will be expanded so a file can be chosen. Once the contents of the file appears in the right side of the screen, you can edit it as needed.

When your edits are complete, click Save to simply save the file. However, in many cases, your changes will not be available until the collection in Solr is reloaded. If you’d like to reload the collection immediately, click Save and Reload Collection. This will cause a momentary hiccup in index and query activities, so you may need to schedule this for a time of low system activity.