Looking for the old docs site? You can still view it for a limited time here.

Backups Concepts

A backup of a collection is a copy of all metadata about the collection and all data in the collection at the time when the backup is made.

Use the Backups API to create, restore, delete, and get information about collection backups.

Important
Backups API methods create and restore backups of collections, not clusters. If a cluster contains four collections and you want to back up the cluster, you need to back up all four collections.

Programmatically created vs. automatic backups

There are two kinds of backups of collections:

  • Programmatically created – You can create and delete backups programmatically using the Backups APIs. These have a createdBy value of customerName.

  • Automatic backups – Managed Search backs up all collections in a cluster automatically. Automatically created backups have a createdBy value of _scheduler.

This table describes the differences between these kinds of backups.

Backup type Creation Retention

Programatically created

Process – Submit a POST request to the backups endpoint of the Backups API to create a backup.

Process – Submit a DELETE request to the backups/backupId endpoint to delete a backup.

Schedule – There is no set creation schedule for programmatically created backups. They are created upon request only.

Schedule – Programmatically created backups are retained until the user explicitly deletes them using the Backups API.

Automatic

Process – Managed Search automatically backs up all collections in a cluster based on the automatic backup schedule.

Process – Backups are retained based on the retention schedule.

Schedule – Managed Search automatically backs up all collections in a cluster every two hours:

  • Initial – The inital backup is created at 00:00.

  • Subsequent – Subsequent backups are created at 02:00, 04:00, and so forth until 22:00. Creation times depend on the amount of data in the collection; for example, the creation time might be 5 or 10 minutes later.

Schedule – The retention schedule for automatic backups is executed on a rolling basis:

  • Initial and subsequent backups – All backups from a 24-hour period are kept for 7 days. After 7 days, only the last backup from each 24-hour period is kept.

Advantages over Apache Solr

Managed Search offers several advantages over the backup functionality of Apache Solr.

  • Restoration of existing collections – Managed Search allows the restoration of backups to new or existing collections.

  • Incremental backups – Backups are made incrementally, so only data that has changed since the last backup is saved. The incremental backup process helps improve speed and reduce storage costs.

  • Corruption checks – The backup process has a built-in corruption check which verifies index file checksums encoded by Lucene. Managed Search users receive an index corruption check every two hours, as a result of the automatic backup process.