• Partners Pricing
  • Partner PortalCustomer Sign in
  • Menu

    Close
    • Google Workspace Backup
    • Microsoft 365 Backup
    • K8s Backup & Management
    • AWS Backup
    • Azure Backup
    • Platform Overview
    • Pricing
    • Partners
    • About Afi
    • Support
    • Blog
    Sign in
  • Platform

    Overview of Afi technology and next-generation architecture

DATA PROTECTION

  • — Google Workspace

  • — Microsoft 365

  • — Kubernetes

  • — Amazon Web Services

  • — Microsoft Azure

From the blog

  • Google Workspace Backup Solutions Review
  • Microsoft Teams Backup: Options & Key Features
  • Can Ransomware Hit Your Microsoft 365 Data?
  • Security & Legal

About Us

Learn more and get in touch with us
  • More Details

  • Leadership Statement
  • Resource Library
  • Agreements

  • Privacy Policy
  • Terms of Service
  • Support ticket

    Submit a new support ticket or check the resolution of an existing ticket

  • Documentation

    Review product documentation in Afi Knowledge Base

Kubernetes Backup Using Velero

Updated Mar 7, 2025
~15 min read•~500 words

At a Glance

  • Velero is the most popular free backup solution for Kubernetes, with nearly 100% market share
  • Velero offers flexible configuration options, but requires substantial domain knowledge for proper configuration and maintenance
  • Afi develops an easy-to-manage, cloud-based Kubernetes backup service that overcomes many of the limitations inherent in Velero.

Intro

This blog post is based on our internal Kubernetes backup product research, focusing on the most widely used K8s backup option.

Afi.ai itself develops a Kubernetes backup platform (you can learn more here). We attempt to remain as impartial and objective as possible, and we hope that you will find this review useful when evaluating your options.

 1 

What Is a Kubernetes Backup

Kubernetes users may believe that they don’t need backups because they run highly available clusters and everything is deployed from files in git repos and Terraform scripts. However, the high availability doesn’t guarantee that Kubernetes applications can be recovered and be fully operational if the configuration and underlying data are modified, corrupted, or lost (due to user errors, or cybersecurity incidents).

Kubernetes-native backup options can help recover and minimize downtime from misconfigurations, malware and system failures. Velero – with over 20,000 estimated active users – is by far the most popular Kubernetes backup tool.

Kubernetes Backup Estimated Market Shares

In this blog post we explore how to start using Velero to protect your Kubernetes environment, and explore its key configuration options.

 2 

Overview and Key Facts

Velero is an open-source backup tool that powers automated backup and restore of Kubernetes clusters configuration and persistent volumes. It helps you recover Kubernetes workloads in case of data loss, or migrate your workloads and data to another cluster.

Velero can perform both on-demand and scheduled backups, allowing you to back up your data before major Kubernetes or service updates, and providing continuous protection for your workloads.

Velero vs. Non-native K8s Backup Tools

As a Kubernetes-native backup tool, Velero backs up and restores both persistent volumes (PVs) —which store applications’ data— as well as Kubernetes application configuration which includes deployment manifests, resource allocation, and network configs that are needed to recover the K8s applications automatically.

In contrast, when you use non-native tools to protect Kubernetes your backups will not include all application configs, and you will need to recover/recreate your system in multiple steps. E.g. if you use storage snapshots to backup PVs you may need to manually re-configure persistent volume claims (PVCs) and Kubernetes cluster configs.

One distinct non-native K8s backup option is virtual machines (VM) backup. If you rely on VM backup software to backup all Kubernetes nodes, your backups will include all application data and configuration. However, the recovery options will be limited to a full cluster recovery with all nodes, as granular application-level or file-level recovery will not be possible due to the lack of Kubernetes configuration awareness within the backup software. Additionally, some backups may be inconsistent and unrecoverable, since VM backup software is unaware of Kubernetes applications running inside VMs (unlike Velero, which addresses this issue as we’ll see in the next section).

Supported Kubernetes Distros

Velero offers extensive support for Kubernetes distributions and infrastructure environments, including major cloud providers (Amazon Web Services, Microsoft Azure, Google Cloud Platform, Digital Ocean, Alibaba Cloud), and on-premises environments (VMware vSphere, Rancher, OpenShift, etc).

 3 

Backup Options

Velero has 3 main options with regards to the backup mechanism and backup data location:

  1. Plain CSI snapshots
  2. File-based backup enabling offsite backup storage
  3. CSI Snapshot Data Movement Mode which combines storage snapshot mechanism with offsite file-based backups

We're going to describe all the three options, but to implement a reliable backup solution, we recommend using option

3
. It helps store backups outside of your primary infrastructure, preferably in a different DC or region, and ensures the recoverability in case your cluster is unavailable

Regardless of the backup option you choose, Velero always stores Kubernetes cluster configuration/manifests backups offsite, in an object storage bucket. This approach ensures the recovery of Kubernetes configuration in case the cluster has been destroyed or is no longer available.

Option
1
Plain CSI snapshots

Plain CSI snapshots are the default backup option. Velero will store Kubernetes configuration backups in a S3 storage bucket, and Persistent Volume (PV) backups will be stored on the same storage from which they are taken.

This option enables fast data recovery since backups are stored locally in the Kubernetes cluster storage infrastructure. However, it poses a risk of losing all backup data in the event of storage malfunction, human error, or malware.

Option
2
Filesystem Backup mechanism

A filesystem-level backup is performed by scanning all files and directories in a Persistent Volume (PV). Velero uses two open-source tools - Restic and Kopia to scan the file systems of PVs in your cluster, identify changes at the file level, and send the changed data (backup increments) to an offsite backup storage.

Because the scanning happens at the same time when K8s applications update their data on the PVs, the resulting backups may have inconsistencies. E.g. if multiple files are updated by a K8s app at the same time as they're being scanned and uploaded to a backup, the backup may have outdated versions of some files.

To avoid the inconsistency in backup data and increase recoverability, applications are normally frozen (put into a paused state) when a filesystem-based backup is being run. Scanning a whole filesystem usually takes much longer than making a CSI snapshot of the storage system that contains the filesystem, so the application may stay frozen for extended periods. The resulting downtime may last for hours or days for complex applications with many files.

The filesystem backup option was the only way to back up data and store it offsite before the CSI Snapshot Data Movement option was introduced with the Velero 1.12 release in August 2023. Now, this option is rarely used. One reason you may still need to use the filesystem backup mechanism is if you use EFS, AzureFile, NFS, emptyDir, or any other volume type that doesn’t have a native snapshot concept (making the Option

3
unavailable).
Option
3
New CSI Snapshot Data Movement Mode

CSI Snapshot Data Movement is a new Velero backup mode released in August 2023 and it helps resolve performance issues associates with the filesystem backup mechanism described above.

When executing a backup, Velero uses CSI to instruct the underlying storage system to take a PV snapshot. Velero then uses Restic/Kopia to create a file-level backup from the snapshot and move it to an offsite backup storage location.

The main benefit of this new backup mode is that it doesn’t need to perform a file-level scan of a live persistent volume. It instead uses a virtual disk created from a CSI storage snapshot to take a file-level backup (using Restic or Kopia) and move it offsite. Because storage snapshots can be taken fast, this mode allows Velero to create consistent backups with minimal application downtime.

In order for the backup process to work, each storage device must have enough available storage space to store the temporary CSI storage snapshots used to generate the file-level backups. If you run an on-premises Kubernetes cluster, you need to take this into account when sizing your storage system. It’s recommended to have approximately 30% free storage headroom to store the snapshots.

 

 4 

Velero Installation

We are going to use Google Cloud Platform (GCP) for this example. The installation instructions for other cloud providers are very similar.

Before installing Velero, we need to create a Google Cloud Storage (GCS) bucket to store backups. You can use GCP Console or gsutil to create a storage bucket. In this article, we will use "velero-intro-bucket" as the name of the bucket.

To authorize Velero to use GCP APIs, it needs two types of permissions: access to the “velero-intro-bucket”, and the ability to create snapshots and disks. You can find detailed instructions on Velero’s GitHub page. Follow the steps provided at https://github.com/vmware-tanzu/.

After completing the setup, you will have a file named “credentials-velero” in your working directory.

To create CSI snapshots, Velero requires a snapshot class that specifies the storage parameters. “pd.csi.storage.gke.io” is commonly used in GKE as the CSI driver.

Create a snapshot class that uses this CSI driver with all parameters set to their defaults:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  labels:
    velero.io/csi-volumesnapshot-class: "true"
  name: velero-snapclass
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

The key element of this storage class definition is the label 'velero.io/csi-volumesnapshot-class.' This label instructs Velero to use this snapshot class unless a backup policy specifies otherwise.

Next, we need to configure encryption for backups, as Velero's default settings is unsecure. Without setting a password, Velero will use a default password, which could allow anyone with access to the velero-intro-bucket to read and decrypt backup contents.

Before installing Velero we have to create a namespace for it, and a secret that holds your encryption password:

%: kubectl create ns velero
%: kubectl -n velero create secret velero-repo-credentials \
   --from-literal=repository-password=YOUR-PASSWORD

There are two important notes related to the the encryption password.

First, the password is stored in the same cluster that Velero protects. If the cluster is unavailable, the password will be inaccessible, making it impossible to restore data from backups. Ensure you have a secure copy of all passwords used with Velero.

Secondly, Velero does not support password changes for existing backups or key rotation.

And the final step is to install install Velero:

%: velero install \
  --provider gcp \
  --plugins velero/velero-plugin-for-gcp:v1.9.0,velero/velero-plugin-for-csi:v0.7.0 \
  --bucket velero-intro-bucket \
  --secret-file ./credentials-velero \
  --use-node-agent \
  --default-snapshot-move-data \
  --features=EnableCSI \
  --wait

 

 5 

Backup & Recovery

Let’s walk through a simple example of backup and restore using Velero. We will use a WordPress (WP) instance as our example application.

Let’s install WP first:

%: helm install --namespace wp-0 --create-namespace \
  wp-release-0 oci://registry-1.docker.io/bitnamicharts/wordpress \
  --wait

To protect a WP namespace, use the following command:
%: velero schedule create wp-hourly --include-namespaces wp-0 --schedule "@hourly"

This command instructs Velero to configure a regular backup (schedule in Velero’s terminology) that runs every hour and protects a namespace called wp-0.

You can list regular backups with the following command:

%: velero schedule get

Also, you can request the same information with a call to kubectl:

%: kubectl -n velero get schedules

After installing Velero, you can manage it using the Velero CLI tool or kubectl. All backups can be controlled with kubectl, enabling seamless integration of Velero into your GitOps pipeline.

Also you can run a backup of WP manually at any time with the following command:

%: kubectl -n velero get schedules

Once the backup is complete, let us simulate a disaster by deleting the WP namespace:

%: kubectl delete ns wp-0

And we can restore the WP application from the backup using this command:


%: velero restore create wp-after-disaster --from-backup=wp-bkp-test

 

 6 

Important considerations

There are a few important aspects of Velero data protection that require a separate blog post. In this section, we will cover the most critical points on data consistency and overall security measures.

Velero Backup Consistency

Kubernetes applications may need to be paused or put in a special freeze state before they are backed up to achieve a recoverable backup. This is crucial for applications that write data to storage volumes or databases, as well as for applications running on several nodes. Pausing or quiescing these applications ensures no data changes occur during the backup process, capturing a consistent state of the data.

To take consistent backup of Kubernetes apps, Velero uses a mechanism called pre-/post- backup hooks, which we’ll cover in detail in a separate blog post. In short, hooks is a way to execute scripts that pause an application before a backup to ensure the backups are recoverable.

Backup repository password

Velero has a default backup repository password saved in the secret [velero-repo-credentials]. This password is used to encrypt all backup data. It is of utmost importance to change the default password, as its default value is commonly known. Additionally, make sure to keep it secret, as Velero does not support changing the password for existing backup repositories.

Node agent privileged access

Backup options

2
(Filesystem Backup) and
3
(CSI Snapshot Data Movement) require deploying Velero with the Node Agent, an optional component that runs in a privileged container. Privileged containers pose a security risk because they are not constrained to their pod and have root access to the host. For this reason, some environments, like GKE Autopilot, do not allow privileged containers.

 

 7 

Final Thoughts

Velero helps implement automated Kubernetes application backup and accelerate the time to recovery, compared to disaster recovery plans reliant on Kubernetes deployment solutions and storage replication.

Using Velero you can manage backup and recovery of your entire K8s application, including its configs and storage. It is, therefore, much more robust than manual data protection and disaster recovery plans that require you to take multiple steps, including:

  1. Configure and maintain recovery plans using deployment tools (e.g. Argo CD)
  2. Configure storage replication and/or backup using tools available from your cloud infrastructure provider or hardware storage device to ensure data availability in case of a primary infrastructure malfunction, human error or a hack.
  3. Develop and maintain scripts to ensure that the replicated application storage is available to the right application components when application is recovered (redeployed).

At the same time, Velero has a number of important limitations that may complicate its use in an enterprise production environment:

  • Velero uses a single encryption key for all backup repositories it creates. This means that if the encryption key is compromised, all backup data becomes accessible and vulnerable. Velero also doesn’t support password change for existing backups, or key rotation.
  • Restic and Kopia, which power offsite Velero backups, are still in the 'beta quality' support stage. This means users may face stability issues, especially at scale.
  • When protecting complex applications Velero may require extensive scripting and maintenance. For example, Velero doesn’t have the built-in capability to pause applications that run components across several pods, since it can only apply pre-/post- hooks to one pod at a time). This means that in order to pause all pods during a backup you’ll have to use CronJobs, or other scheduling mechanisms, for backup scheduling.

This blog post is based on the result of our internal Kubernetes backup product research. Afi develops cloud-based Kubernetes backup service which overcomes many of the issues inherent in Velero. Please feel free to check the product page to learn more or read other blog posts to learn more about Kubernetes data protection.

 

Related stories

G Suite email backup options overview.

How to recover deleted G Suite Drive files, Gmail data and Contacts?

Ready to try Afi? It only takes 1 min.

Start free trial
© Afi
Security & Legal
Terms
Privacy