Backup and disaster recovery

Lucidworks backs up your full Managed Fusion environment and customer data on a regular basis in order to ensure consistent availability of your services.

Backups

All backups are replicated across three independent region zones to protect against failure in any one region. Each zone has one or more data centers with independent networking, cooling, power, and other resources to provide fault tolerance against failure. Backup schedules are not configurable. Lucidworks performs the following backups to ensure availability of your services:

	Index data backup	Full Fusion configuration backup	Application and access logs backup
Frequency	Every 24 hours	Every 5 minutes	Every 5 minutes
Retained	2 weeks	The length of your contract	30 days in the frontend, 1 year in cloud logging
Content	Collections, signals, customer data	API keys, field, roles, permissions, index profiles and pipelines, query profiles and pipelines, licenses, rules, etc.	Application and access logs

The exact time for data storage may vary depending on the terms of your agreement with Lucidworks.

Automated Solr backups

You can also schedule backups of Solr collections, store the backups for a configurable period of time, and restore these backups into a specified Fusion cluster when needed. The following guide uses Google Kubernetes Engine (GKE) for examples.

Solr backups using cloud provider storage options

The standard approach of using a provider-specific Persistent Volume Claim (PVC) for storing collection backups ensures consistency in configuration. However, this method does not leverage the unique storage features offered by each cloud provider. For instance, Google Cloud provides Google Cloud Storage, which includes additional features such as access control management, various storage tiers, and other capabilities that are not available when using a PVC. To take advantage of these features, Solr instances running within Fusion require additional provider-specific information.Refer to the Solr documentation for detailed information on the repositories available for configuring collection backups. Each repository type comes with specific configuration options and features. Generally, you will need to integrate the provider-specific configuration into the solr.xml configuration file and ensure that the appropriate library or module for the provider is included in the Solr classpath. This step is necessary for the repository implementation to resolve correctly at runtime.For example, when configuring GCSBackupRepository to store backups in Google Cloud Storage (GCS), it is essential to include the corresponding library for the provider in the Solr classpath. Additionally, you will need to add a section to the solr.xml file similar to the XML example below to specify the target bucket where backups will be stored:

<backup>
  <repository name="gcs_backup" class="org.apache.solr.gcs.GCSBackupRepository" default="false">
    <str name="gcsBucket">solrBackups</str>
    <str name="gcsCredentialPath">/local/path/to/credential/file</str>
    <str name="location">/default/gcs/backup/location</str>
    <int name="gcsClientMaxRetries">5</int>
    <int name="gcsClientHttpInitialRetryDelayMillis">1500</int>
    <double name="gcsClientHttpRetryDelayMultiplier">1.5</double>
    <int name="gcsClientHttpMaxRetryDelayMillis">10000</int>
  </repository>
</backup>

After configuring the backup provider, you can utilize the standard Solr backup and restore APIs to create new backups or restore from existing ones. Instead of writing to a PVC, backups will be stored in the storage solution specific to the provider.

Solr backups using Persistent Volume Claim

Backups are taken using the Solr collection BACKUP command. This requires that each Solr node has access to a shared volume or a ReadWriteMany volume in Kubernetes. Most cloud providers offer a simple way of creating a shared filestore and exposing it as a PersistentVolumeClaim within Kubernetes to mount into the Solr pods. An option is added to the setup_f5_PROVIDER.sh scripts in the fusion-cloud-native repository to provision these.The backup action of the script is invoked by a Kubernetes CronJob to run the backup schedule. The backups are saved to a configurable directory with an automatically generated name: <collection_name>-<timestamp_in_some_format>.A separate CronJob is responsible for cleanup and retention of backups. Cleanup can be disabled if not needed. Setting a series of retention periods can automatically remove backups as they become outdated.For example, a cluster that backs up a collection every 3 hours could specify a retention policy that:

Keeps all backups for a single day.
Keeps a single daily backup for a week.
Keeps a single weekly backup for a month.
Keeps a single monthly backup for 6 months.
Deletes all backups that are older than 6 months.

All times are configurable as part of the configmap for this service.The process for restoring a collection is a manual step involving kubectl run to invoke the Solr RESTORE action pointing to the collection and the name of the backup being restored.

These instructions are for GKE only. For other platforms, backup and restoration involves copying the collection to the cloud and using Parallel Bulk Loader.

Install using a PVC with GKE

The solr-backup-runner requires that a ReadWriteMany volume is mounted onto all solr and backup-runner pods so they all back up to the same filesystem.The easiest way to install on GKE is by using a GCP Filestore as the ReadWriteMany volume.

Create the Filestore.

gcloud --project "${GCLOUD_PROJECT}" filestore instances create "${NFS_NAME}"  --tier=STANDARD --file-share=name="solrbackups,capacity=${SOLR_BACKUP_NFS_GB}GB" --zone="${GCLOUD_ZONE}" --network=name="${network_name}"

Fetch the IP of the Filestore.

NFS_IP="$(gcloud filestore instances describe "${NFS_NAME}" --project="${GCLOUD_PROJECT}" --zone="${GCLOUD_ZONE}" --format="value(networks.ipAddresses[0])")"

Create a Persistent Volume in Kubernetes that is backed by this volume.

cat <<EOF | kubectl -n "${NAMESPACE}" apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
 name: ${NAMESPACE}-solr-backups
 annotations:
   pv.beta.kubernetes.io/gid: "8983"
spec:
 capacity:
   storage: ${SOLR_BACKUP_NFS_GB}G
 accessModes:
   - ReadWriteMany
 nfs:
   path: /solrbackups
   server: ${NFS_IP}
EOF

Create a Persistent Volume Claim in the namespace that Solr is running in.

cat <<EOF | kubectl -n "${NAMESPACE}" apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: fusion-solr-backup-claim
spec:
 volumeName: ${NAMESPACE}-solr-backups
 accessModes:
   - ReadWriteMany
 storageClassName: ""
 resources:
   requests:
     storage: ${SOLR_BACKUP_NFS_GB}G
EOF

Add the following values to your existing (or a new) Helm values file.

solr-backup-runner:
 enabled: true
 sharedPersistentVolumeName: fusion-solr-backup-claim
solr:
 additionalInitContainers:
   - name: chown-backup-directory
     securityContext:
       runAsUser: 0
     image: busybox:latest
     command: ['/bin/sh', '-c', "owner=$(stat -c '%u' /mnt/solr-backups);  if [ ! \"${owner}\" = \"8983\" ]; then chown -R 8983:8983 /mnt/solr-backups; fi "]
     volumeMounts:
       - mountPath: /mnt/solr-backups
         name: solr-backups
 additionalVolumes:
   - name: solr-backups
     persistentVolumeClaim:
       claimName: fusion-solr-backup-claim
 additionalVolumeMounts:
   - name: solr-backups
     mountPath: "/mnt/solr-backups"

Upgrade the release. Solr backups are now enabled.

Disaster recovery

Lucidworks has a comprehensive disaster recovery program to support critical business operations for clients. The disaster recovery program is tested yearly. If you need to recover your Managed Fusion environment from a backup, contact Lucidworks support. In the event of a disaster that affects your Fusion environment data, Lucidworks will contact you as soon as we are aware of it.

UI tour

Index data

Query data

Metrics and analytics

Improve your queries

Administration

Developer documentation

Machine learning

Neural Hybrid Search

Release notes

FAQs

Backups

Automated Solr backups

Install using a PVC with GKE

Disaster recovery

UI tour

Index data

Query data

Metrics and analytics

Improve your queries

Administration

Developer documentation

Machine learning

Neural Hybrid Search

Release notes

FAQs

​Backups

​Automated Solr backups

​Install using a PVC with GKE

​Disaster recovery

Backups

Automated Solr backups

Install using a PVC with GKE

Disaster recovery