Concepts

Solr concepts

These are Solr concepts that are significant for Managed Search (and that retain their Solr meanings).

Learn more about Solr concepts by reading How SolrCloud Works.

Term Meaning

Cluster

A cluster is a group of one or more nodes that holds all of your data, and that provides federated indexing and search capabilities across all nodes. Each cluster has a unique name.

Node

A node is a single server that is part of your cluster, that stores your data, and that participates in the cluster’s indexing and search capabilities. Each node has a unique name.

Collection

A collection is a group of documents that have similar characteristics. For example, you could have a collection that contains a product catalog and an index of the catalog for searching the catalog. A collection is identified by a name. You can have multiple collections in a single cluster.

Document

A document is a basic unit of information that can be indexed. For example, you could have a document for a single product, and a document for a single order. A document is stored in a JSON format. A collection can have as many documents as needed.

Shard

A collection can store a large amount of data, so much that the data can exceed the storage capacity of a single node. Lucidworks can subdivide your collection into multiple shards. Each shard is a fully-functional and independent subset of a collection that can be hosted on any node in the cluster. Sharding lets you horizontally split/scale your content volume and allows you to distribute and parallelize operations across shards (potentially on multiple nodes), thus increasing performance output. You can change the number of shards after your collection has been created, but it is not a trivial task.

Replica

In a cloud environment where failures can be expected anytime, it is useful to have a failover mechanism in case a shard/node goes offline or disappears. This failover mechanism is to allow users to make one or more copies of your collection’s shards into what are called replica shards, or replicas. Replication is important because it provides high availability in case a shard/node fails(a replica is never allocated on the same node as the original/primary shard that is was copied from). It also allows you to scale out your search volume since searches can be executed on all replicas in parallel. You can change the number of replicas anytime after your collection has been created.

To summarize, each cluster has a group of nodes that provides you indexing and search capabilities for your data. Your data is stored in collection which can be split into multiple shards and each shard can be replicated.