Skip to main content
Your data is organized into collections. When you create an app, Lucidworks Search automatically creates a collection with the same name. You can create additional collections in any app. A primary collection contains the data that your users will search. Every primary collection is associated with a set of auxiliary collections that contain related data, such as signals, aggregations, and more. Under the hood, a Lucidworks Search collection is a distributed index in Solr, defined by a named configuration stored in ZooKeeper, with these properties:
  • Number of shards. Documents are distributed across this number of partitions.
  • Document routing strategy. How documents are assigned to shards.
  • Replication factor. How many copies of each document in the collection.
  • Replica placement strategy. Where to place replicas in the cluster.
If your data is already stored in a Solr instance or cluster, you can manage this collection in Lucidworks Search by creating a Lucidworks Search collection that imports the existing Solr collection.
Collection names are case-insensitive, but Lucidworks Search preserves case when displaying collection names.

Auxiliary collections

Every primary collection is associated with a set of auxiliary collections that contain related data, such as signals, aggregations, and more. Some auxiliary collections are created for every primary collection. Others are created only for the app’s default collection, one per app. Auxiliary collections are described below:
APP_NAME_job_reportsOutput from Lucidworks Search experiments, Ranking Metrics jobs, and Head/Tail Analysis jobs.1 per app
APP_NAME_query_rewriteA collection of documents to use for rewriting queries, optimized for high-volume traffic. These documents originate from the COLLECTION_NAME_query_rewrite_staging collection. Certain Lucidworks Search query pipeline stages read from this collection:
Text Tagger
Apply Rules
Modify Response with Rules
1 per app
APP_NAME_query_rewrite_stagingA collection of documents created by the Rules Editor or by certain Lucidworks Search jobs, not optimized for production traffic. Documents move from this collection to the COLLECTION_NAME_query_rewrite collection as follows:
● Job output documents with high confidence contain a review=auto field and are moved to the COLLECTION_NAME_query_rewrite collection automatically.
● Job output documents with low confidence contain a review=pending field. When these are approved by a Lucidworks Search user, Lucidworks Search copies them to the COLLECTION_NAME_query_rewrite collection.
1 per app
COLLECTION_NAME_signalsA search query logs and signals collection.1 per collection
COLLECTION_NAME_signals_aggrA collection for aggregated signals.1 per collection
APP_NAME_user_prefsA collection of data to support App Studio’s social features, such as user-generated tags, bookmarks, comments, ratings, and so on.1 per app
Don’t create primary collections with names that end in the suffixes above; these are reserved for Lucidworks Search auxiliary collections, which are created and managed by Lucidworks Search directly.
Lucidworks Search maintains a set of Solr collections that store Lucidworks Search’s own log files and other internal information. These are called System Collections, described below.
Don’t create primary collections named “logs” or beginning with “system_”. These names are reserved for Lucidworks Search system collections.
Lucidworks Search uses ZooKeeper to register information about all collections, and the Lucidworks Search components and services related to a collection. The Lucidworks Search components associated with a collection include:
  • Datasources
  • Pipelines
  • Profiles
  • Signals and aggregations
  • Analytics dashboards

System collections

Lucidworks Search automatically creates some collections that are used for internal purposes and shared across all apps:
  • system_autocomplete stores the content that the Lucidworks Search UI displays when you use the search bar.
  • system_blobs stores blobs in Solr. This is used to store model files for the NLP components and other binary files used by Lucidworks Search components.
  • system_history keeps a record of configuration changes, start and stop times for services and experiments, and more.
  • system_jobs_history keeps a record of Lucidworks Search jobs, including start/stop times and status.

Collection configuration properties

Collections have three properties that you can configure only when you are creating a collection using the Collections API.
PropertyDescriptionDefault behavior
signals*The signals property determines whether to create auxiliary collections with suffixes _signals and _signals_aggr.When you create a collection in the Lucidworks Search UI, signals defaults to true. When you create a collection using the Lucidworks Search API, this property defaults to false.
searchLogsThe searchLogs property determines whether to create an auxiliary search query logs collection with suffix _logs.When you create a collection in the Lucidworks Search UI, this property defaults to true. When you create a collection using the Lucidworks Search API, this property defaults to false.
commitWithinThe commitWithin property guarantees that the data is committed and available for searching within the time specified in the value.The default of 10000 milliseconds saves the data and makes that data available for searching within 10 seconds. The default for signal collections is 1000 milliseconds.
autoCommitThe autoCommit property (Solr hard commit) is inherited from the collection’s solrconfig.xml.By default, this setting is typically set to 15 seconds with openSearcher=false. It saves the data, but does not force the search results to refresh immediately. With this setting, search performance is not slowed, but the new data may not show in search results until the next refresh. This property can be used instead of the commitWithin property and is set using the Solr configuration.
autoSoftCommitThe autoSoftCommit property (Solr soft commit) does not save the data, but makes the data visible to searches almost immediately. If the system crashes, that new data is lost because it has not been saved. This property is set using the Solr configuration.The default setting is turned off, and search visibility is managed using the commitWithin setting.
*Signals are events with timestamps that can be used to improve search results. For more information about signals in Lucidworks Search, see Signals in the Lucidworks Search documentation. In schemaless mode, if a document contains a field not currently in the Solr schema, Solr processes the field value to determine what the field type should be defined as, and then adds a new field to the schema with the field name and field type. This behavior can be convenient during preliminary application development, but it’s rarely appropriate in a production environment.

Using profiles to associate collections with pipelines

Index pipelines and query pipelines aren’t connected to a specific collection by default. Index profiles and query profiles are configurations that create consistent endpoints for indexing and querying, each with a specific pipeline and collection.

Fields Editor UI

The Fields Editor UI allows you to create and configure the schema file directly from Lucidworks Search. For instructions, see Fields Editor UI.

Learn more

Fusion Applications and Collections

The course for Fusion Applications and Collections focuses on how Fusion transforms your siloed data into personalized insights unique to each user with apps and collections.