Backups
All backups are replicated across three independent region zones to protect against failure in any one region. Each zone has one or more data centers with independent networking, cooling, power, and other resources to provide fault tolerance against failure. Backup schedules are not configurable. Lucidworks performs the following backups to ensure availability of your services:Index data backup | Full Fusion configuration backup | Application and access logs backup | |
---|---|---|---|
Frequency | Every 24 hours | Every 5 minutes | Every 5 minutes |
Retained | 2 weeks | The length of your contract | 30 days in the frontend, 1 year in cloud logging |
Content | Collections, signals, customer data | API keys, field, roles, permissions, index profiles and pipelines, query profiles and pipelines, licenses, rules, etc. | Application and access logs |
The exact time for data storage may vary depending on the terms of your agreement with Lucidworks.
Automated Solr backups
You can also schedule backups of Solr collections, store the backups for a configurable period of time, and restore these backups into a specified Fusion cluster when needed. The following guide uses Google Kubernetes Engine (GKE) for examples.Solr backups using cloud provider storage options
Solr backups using cloud provider storage options
The standard approach of using a provider-specific Persistent Volume Claim (PVC) for storing collection backups ensures consistency in configuration. However, this method does not leverage the unique storage features offered by each cloud provider. For instance, Google Cloud provides Google Cloud Storage, which includes additional features such as access control management, various storage tiers, and other capabilities that are not available when using a PVC. To take advantage of these features, Solr instances running within Fusion require additional provider-specific information.Refer to the Solr documentation for detailed information on the repositories available for configuring collection backups. Each repository type comes with specific configuration options and features. Generally, you will need to integrate the provider-specific configuration into the After configuring the backup provider, you can utilize the standard Solr backup and restore APIs to create new backups or restore from existing ones. Instead of writing to a PVC, backups will be stored in the storage solution specific to the provider.
solr.xml
configuration file and ensure that the appropriate library or module for the provider is included in the Solr classpath. This step is necessary for the repository implementation to resolve correctly at runtime.For example, when configuring GCSBackupRepository
to store backups in Google Cloud Storage (GCS), it is essential to include the corresponding library for the provider in the Solr classpath. Additionally, you will need to add a section to the solr.xml
file similar to the XML example below to specify the target bucket where backups will be stored:Solr backups using Persistent Volume Claim
Solr backups using Persistent Volume Claim
Backups are taken using the Solr collection BACKUP command. This requires that each Solr node has access to a shared volume or a
ReadWriteMany
volume in Kubernetes. Most cloud providers offer a simple way of creating a shared filestore and exposing it as a PersistentVolumeClaim
within Kubernetes to mount into the Solr pods. An option is added to the setup_f5_PROVIDER.sh scripts in the fusion-cloud-native repository to provision these.The backup action of the script is invoked by a Kubernetes CronJob to run the backup schedule. The backups are saved to a configurable directory with an automatically generated name: <collection_name>-<timestamp_in_some_format>
.A separate CronJob is responsible for cleanup and retention of backups. Cleanup can be disabled if not needed. Setting a series of retention periods can automatically remove backups as they become outdated.For example, a cluster that backs up a collection every 3 hours could specify a retention policy that:- Keeps all backups for a single day.
- Keeps a single daily backup for a week.
- Keeps a single weekly backup for a month.
- Keeps a single monthly backup for 6 months.
- Deletes all backups that are older than 6 months.
configmap
for this service.The process for restoring a collection is a manual step involving kubectl run
to invoke the Solr RESTORE
action pointing to the collection and the name of the backup being restored.These instructions are for GKE only. For other platforms, backup and restoration involves copying the collection to the cloud and using Parallel Bulk Loader.
Install using a PVC with GKE
Thesolr-backup-runner
requires that a ReadWriteMany
volume is mounted onto all solr
and backup-runner
pods so they all back up to the same filesystem.The easiest way to install on GKE is by using a GCP Filestore as the ReadWriteMany
volume.-
Create the Filestore.
-
Fetch the IP of the Filestore.
-
Create a Persistent Volume in Kubernetes that is backed by this volume.
-
Create a Persistent Volume Claim in the namespace that Solr is running in.
-
Add the following values to your existing (or a new) Helm values file.
- Upgrade the release. Solr backups are now enabled.