Velero has 3 main options with regards to the backup mechanism and backup data location:
- Plain CSI snapshots
- File-based backup enabling offsite backup storage
- CSI Snapshot Data Movement Mode which combines storage snapshot mechanism with offsite file-based backups
We're going to describe all the three options, but to implement a reliable backup solution, we recommend using option
3
. It helps store backups outside of your primary infrastructure, preferably in a different DC or region, and ensures the recoverability in case your cluster is unavailable
Regardless of the backup option you choose, Velero always stores Kubernetes cluster configuration/manifests backups offsite, in an object storage bucket. This approach ensures the recovery of Kubernetes configuration in case the cluster has been destroyed or is no longer available.
Option
1
Plain CSI snapshots
Plain CSI snapshots are the default backup option. Velero will store Kubernetes configuration backups in a S3 storage bucket, and Persistent Volume (PV) backups will be stored on the same storage from which they are taken.
This option enables fast data recovery since backups are stored locally in the Kubernetes cluster storage infrastructure. However, it poses a risk of losing all backup data in the event of storage malfunction, human error, or malware.
Option
2
Filesystem Backup mechanism
A filesystem-level backup is performed by scanning all files and directories in a Persistent Volume (PV). Velero uses two open-source tools - Restic and Kopia to scan the file systems of PVs in your cluster, identify changes at the file level, and send the changed data (backup increments) to an offsite backup storage.
Because the scanning happens at the same time when K8s applications update their data on the PVs, the resulting backups may have inconsistencies. E.g. if multiple files are updated by a K8s app at the same time as they're being scanned and uploaded to a backup, the backup may have outdated versions of some files.
To avoid the inconsistency in backup data and increase recoverability, applications are normally frozen (put into a paused state) when a filesystem-based backup is being run. Scanning a whole filesystem usually takes much longer than making a CSI snapshot of the storage system that contains the filesystem, so the application may stay frozen for extended periods. The resulting downtime may last for hours or days for complex applications with many files.
The filesystem backup option was the only way to back up data and store it offsite before the CSI Snapshot Data Movement option was introduced with the Velero 1.12 release in August 2023. Now, this option is rarely used. One reason you may still need to use the filesystem backup mechanism is if you use EFS, AzureFile, NFS, emptyDir, or any other volume type that doesn’t have a native snapshot concept (making the Option
3
unavailable).
Option
3
New CSI Snapshot Data Movement Mode
CSI Snapshot Data Movement is a new Velero backup mode released in August 2023 and it helps resolve performance issues associates with the filesystem backup mechanism described above.
When executing a backup, Velero uses CSI to instruct the underlying storage system to take a PV snapshot. Velero then uses Restic/Kopia to create a file-level backup from the snapshot and move it to an offsite backup storage location.
The main benefit of this new backup mode is that it doesn’t need to perform a file-level scan of a live persistent volume. It instead uses a virtual disk created from a CSI storage snapshot to take a file-level backup (using Restic or Kopia) and move it offsite. Because storage snapshots can be taken fast, this mode allows Velero to create consistent backups with minimal application downtime.
In order for the backup process to work, each storage device must have enough available storage space to store the temporary CSI storage snapshots used to generate the file-level backups. If you run an on-premises Kubernetes cluster, you need to take this into account when sizing your storage system. It’s recommended to have approximately 30% free storage headroom to store the snapshots.