Skip to main content
Lucidworks backs up your full Managed Fusion environment and customer data on a regular basis in order to ensure consistent availability of your services.

Backups

All backups are replicated across three independent region zones to protect against failure in any one region. Each zone has one or more data centers with independent networking, cooling, power, and other resources to provide fault tolerance against failure. Backup schedules are not configurable. Lucidworks performs the following backups to ensure availability of your services:
Index data backupFull Fusion configuration backupApplication and access logs backup
FrequencyEvery 24 hoursEvery 5 minutesEvery 5 minutes
Retained2 weeksThe length of your contract30 days in the frontend, 1 year in cloud logging
ContentCollections, signals, customer dataAPI keys, field, roles, permissions, index profiles and pipelines, query profiles and pipelines, licenses, rules, etc.Application and access logs
The exact time for data storage may vary depending on the terms of your agreement with Lucidworks.

Automated Solr backups

You can also schedule backups of Solr collections, store the backups for a configurable period of time, and restore these backups into a specified Fusion cluster when needed. The following guide uses Google Kubernetes Engine (GKE) for examples.
The standard approach of using a provider-specific Persistent Volume Claim (PVC) for storing collection backups ensures consistency in configuration. However, this method does not leverage the unique storage features offered by each cloud provider. For instance, Google Cloud provides Google Cloud Storage, which includes additional features such as access control management, various storage tiers, and other capabilities that are not available when using a PVC. To take advantage of these features, Solr instances running within Fusion require additional provider-specific information.Refer to the Solr documentation for detailed information on the repositories available for configuring collection backups. Each repository type comes with specific configuration options and features. Generally, you will need to integrate the provider-specific configuration into the solr.xml configuration file and ensure that the appropriate library or module for the provider is included in the Solr classpath. This step is necessary for the repository implementation to resolve correctly at runtime.For example, when configuring GCSBackupRepository to store backups in Google Cloud Storage (GCS), it is essential to include the corresponding library for the provider in the Solr classpath. Additionally, you will need to add a section to the solr.xml file similar to the XML example below to specify the target bucket where backups will be stored:
<backup>
  <repository name="gcs_backup" class="org.apache.solr.gcs.GCSBackupRepository" default="false">
    <str name="gcsBucket">solrBackups</str>
    <str name="gcsCredentialPath">/local/path/to/credential/file</str>
    <str name="location">/default/gcs/backup/location</str>
    <int name="gcsClientMaxRetries">5</int>
    <int name="gcsClientHttpInitialRetryDelayMillis">1500</int>
    <double name="gcsClientHttpRetryDelayMultiplier">1.5</double>
    <int name="gcsClientHttpMaxRetryDelayMillis">10000</int>
  </repository>
</backup>
After configuring the backup provider, you can utilize the standard Solr backup and restore APIs to create new backups or restore from existing ones. Instead of writing to a PVC, backups will be stored in the storage solution specific to the provider.
Backups are taken using the Solr collection BACKUP command. This requires that each Solr node has access to a shared volume or a ReadWriteMany volume in Kubernetes. Most cloud providers offer a simple way of creating a shared filestore and exposing it as a PersistentVolumeClaim within Kubernetes to mount into the Solr pods. An option is added to the setup_f5_PROVIDER.sh scripts in the fusion-cloud-native repository to provision these.The backup action of the script is invoked by a Kubernetes CronJob to run the backup schedule. The backups are saved to a configurable directory with an automatically generated name: <collection_name>-<timestamp_in_some_format>.A separate CronJob is responsible for cleanup and retention of backups. Cleanup can be disabled if not needed. Setting a series of retention periods can automatically remove backups as they become outdated.For example, a cluster that backs up a collection every 3 hours could specify a retention policy that:
  • Keeps all backups for a single day.
  • Keeps a single daily backup for a week.
  • Keeps a single weekly backup for a month.
  • Keeps a single monthly backup for 6 months.
  • Deletes all backups that are older than 6 months.
All times are configurable as part of the configmap for this service.The process for restoring a collection is a manual step involving kubectl run to invoke the Solr RESTORE action pointing to the collection and the name of the backup being restored.
These instructions are for GKE only. For other platforms, backup and restoration involves copying the collection to the cloud and using Parallel Bulk Loader.

Install using a PVC with GKE

The solr-backup-runner requires that a ReadWriteMany volume is mounted onto all solr and backup-runner pods so they all back up to the same filesystem.The easiest way to install on GKE is by using a GCP Filestore as the ReadWriteMany volume.
  1. Create the Filestore.
    gcloud --project "${GCLOUD_PROJECT}" filestore instances create "${NFS_NAME}"  --tier=STANDARD --file-share=name="solrbackups,capacity=${SOLR_BACKUP_NFS_GB}GB" --zone="${GCLOUD_ZONE}" --network=name="${network_name}"
    
  2. Fetch the IP of the Filestore.
    NFS_IP="$(gcloud filestore instances describe "${NFS_NAME}" --project="${GCLOUD_PROJECT}" --zone="${GCLOUD_ZONE}" --format="value(networks.ipAddresses[0])")"
    
  3. Create a Persistent Volume in Kubernetes that is backed by this volume.
    cat <<EOF | kubectl -n "${NAMESPACE}" apply -f -
    apiVersion: v1
    kind: PersistentVolume
    metadata:
     name: ${NAMESPACE}-solr-backups
     annotations:
       pv.beta.kubernetes.io/gid: "8983"
    spec:
     capacity:
       storage: ${SOLR_BACKUP_NFS_GB}G
     accessModes:
       - ReadWriteMany
     nfs:
       path: /solrbackups
       server: ${NFS_IP}
    EOF
    
  4. Create a Persistent Volume Claim in the namespace that Solr is running in.
    cat <<EOF | kubectl -n "${NAMESPACE}" apply -f -
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
     name: fusion-solr-backup-claim
    spec:
     volumeName: ${NAMESPACE}-solr-backups
     accessModes:
       - ReadWriteMany
     storageClassName: ""
     resources:
       requests:
         storage: ${SOLR_BACKUP_NFS_GB}G
    EOF
    
  5. Add the following values to your existing (or a new) Helm values file.
    solr-backup-runner:
     enabled: true
     sharedPersistentVolumeName: fusion-solr-backup-claim
    solr:
     additionalInitContainers:
       - name: chown-backup-directory
         securityContext:
           runAsUser: 0
         image: busybox:latest
         command: ['/bin/sh', '-c', "owner=$(stat -c '%u' /mnt/solr-backups);  if [ ! \"${owner}\" = \"8983\" ]; then chown -R 8983:8983 /mnt/solr-backups; fi "]
         volumeMounts:
           - mountPath: /mnt/solr-backups
             name: solr-backups
     additionalVolumes:
       - name: solr-backups
         persistentVolumeClaim:
           claimName: fusion-solr-backup-claim
     additionalVolumeMounts:
       - name: solr-backups
         mountPath: "/mnt/solr-backups"
    
  6. Upgrade the release. Solr backups are now enabled.

Disaster recovery

Lucidworks has a comprehensive disaster recovery program to support critical business operations for clients. The disaster recovery program is tested yearly. If you need to recover your Managed Fusion environment from a backup, contact Lucidworks support. In the event of a disaster that affects your Fusion environment data, Lucidworks will contact you as soon as we are aware of it.
I