Data retention¶
Data retention rules define how long different kinds of data should be stored as well as how and when the data should be cleaned up once the retention period is over.
Retention rules help organizations to accomplish different goals including:
- comply with various legal or compliance requirements for data storage (HIPAA, FINRA, SOX)
- get rid of obsolete or stale data that occupies significant storage space and makes navigation across the company data more complex
To support these requirements, Afi Backup offers backup version, GFS, and item-level retention rules that clean up the data after it reaches the certain age defined by the rule.
Supported retention policies¶
Backup version retention¶
Backup version retention rules define how long the system keeps historical backup snapshots and apply to all workload types. See the section below for more details.
GFS retention¶
GFS (grandfather-father-son) retention rules are similar to backup version retention rules and are applied at the snapshot level, allowing you to specify a schedule for retaining backup snapshots. This way you can configure a retention rule to keep a specified number of daily, weekly, monthly, and yearly backup snapshots. See the section below for more details.
GFS retention allows you to keep the backup history for an extended period of time (several years+) while saving the backup storage space by deleting intermediate backup snapshots between the snapshots retained according to the configured schedule.
Item-level retention¶
With item-level retention, backed up items (for example, emails or files) are deleted after their last modification date becomes older than the retention period specified by the rule. See the sections linked below for more details.
Item-level retention is supported for the following kinds of data:
- Emails (includes Google Mail data for Google Workspace; Exchange Mail, Online Archive, Group Mail data for Microsoft 365)
- Files (includes Google Drive and Shared Drives data for Google Workspace; OneDrive, OneNote and Sharepoint sites data for Microsoft Office 365)
Warning
Item-level retention rules clean up all items of the corresponding kind (email/file) received or modified before the configured retention period even if they are still present on the Google Workspace or the Microsoft 365 side. If you want to keep full state for each snapshot kept by the Afi service, please consider configuring backup version retention rules instead.
How do retention rules work¶
A retention rule is configured as a backup SLA policy property and is applied to the resources protected by the corresponding policy. The data that is older than the retention period is removed by cleanup procedure launched during periodic backups. If an SLA policy with configured retention rules is removed from a resource (user, drive, site, etc.) or any SLA without configured retention rules is assigned to a resource, the system stops applying retention rules to the resource.
Retention rules can be applied both to active and archived (suspended, deleted, etc.) resource backups. To apply retention rules to archived resource backups, protect the corresponding resources with a backup SLA policy with configured retention rules and make sure that the Apply data retention policy to archived data option in the Archiving section is checked. In this case the Afi service will run periodic backups for archived resources which won't attempt to synchronize any new data, but will apply retention and archiving rules.
Please note that, for performance reasons, cleanup is not performed on every backup, but is launched with a certain probability depending on the retention period duration and last cleanup time. On average, please expect 2-3 weeks delay before an initial retention cleanup is completed after the configuration. Due to this fact, backups with a custom data retention period configured can still contain some items that are older than the retention age and that will be removed during the next cleanup (time lag for retention cleanup doesn't exceed one month). Also need to mention, that in case of retention period change (for example, from 5 years to 3 years), first cleanup after the change can also happen with a delay.
Let's discuss how different retention rule types are applied.
Backup version retention rules¶
With backup version retention rules, backup versions (snapshots) that are older than the configured retention period are deleted, and backed up items (for example, emails or files) are kept as long as there is at least one snapshot where these items are visible. A snapshot represents a state of the corresponding resource (mailbox, drive, site, team, etc.) during the backup and an item (file, email, etc.) is considered visible in the snapshot if this item was present in the resource at the time of the backup. This way, items that are visible in a particular snapshot can be both added or modified in this snapshot as well as in previous backup snapshots. As a result, backup version retention rules clean up items that were deleted or old item versions that were rewritten by new versions before or in the oldest backup snapshot still kept by the Afi service (as these items or item versions are inaccessible in the remaining snapshots).
Info
The most recent backup snapshot for a resource is never deleted by backup version retention rules and is always preserved, regardless of its creation time.
Example: Let's suppose that user A has document X created on 1st February, then edited on 1st March, 15th April and 1st May; document Y created on 1st March and deleted on 2nd March; document Z created on 1st February and not modified since. User A's drive is backed up daily. On the 1st May 1-month backup version retention policy for this user is set up and retention cleanup is done. During the cleanup the system deletes versions of document X from 1st February and 1st March and all the versions of document Y; versions of document X from 15th April and 1st May as well as document Z are preserved as they belong to the snapshots within the retention period.
GFS retention rules¶
When configuring GFS retention rules, you can specify how many daily, weekly, monthly, and yearly backup snapshots you want to keep. The snapshots to retain will be calculated according to the specified settings, and the remaining snapshots will be deleted. When these snapshots are deleted, the item versions that are no longer visible in the retained backup snapshots will also be deleted, and the backup storage occupied by these item versions will be freed.
Afi calculates the backup snapshots to be kept by GFS retention the following way:
- The backup snapshot history is divided into calendar day, week, month, and year periods, with the backup timezone taken into account. Each daily interval starts at 12:00 AM and ends at 11:59 PM in the respective timezone. Each weekly interval starts at 12:00 AM on Monday and ends at 11:59 PM on Sunday. Each monthly interval starts at 12:00 AM on the 1st day of the month and ends at 11:59 PM on the last day of the month. Each yearly interval starts at 12:00 AM on January 1st and ends at 11:59 PM on December 31st.
- Once GFS retention rules are applied, for each rule (daily, weekly, etc.), Afi picks the latest available snapshot from the current unfinished interval, plus X snapshots, one per each of the preceding X finished intervals. Within each of these finished intervals, Afi selects the closest backup snapshot performed during or after the specified day, week, etc. If no such snapshots are found, Afi will select the last available snapshot within the interval.
- The most recent available backup snapshot is always retained, regardless of how old it is.
- When the backup snapshots to be retained are calculated as described above, the remaining backup snapshots will be deleted.
Info
When a backup snapshot is deleted during a retention cleanup, all items/item versions that were added in this snapshot but are still visible in one of the retained snapshots are preserved.
Example: Let's suppose you have daily backup snapshots from June 1, 2024, to August 15, 2024, and you configure GFS retention with the following settings:
- Keep daily backups for 7 days;
- Keep weekly backups for 4 weeks, on Monday each week;
- Keep monthly backups for 2 months, on the 1st day of each month.
The backup snapshots that will be preserved after applying the GFS retention rules above are marked with circles in the picture below. The blue, green, and pink circles denote the retained daily, weekly, and monthly backups, respectively.
Item-level retention rules¶
Emails¶
With item-level email retention rules, all emails with a received date older than the retention period will be deleted from all historical backup snapshots. Emails that were recently moved between labels (folders) or marked as read/unread are still cleaned up based on their original received date (i.e., such actions do not reset the email's age)
Example: Let's suppose that user A receives one report email per day during 3 months (1st February, 2nd February, ..., 1st March, ..., 30th April) and has 90 emails in the mailbox on 1st May. On the 1st May 1-month email retention policy for this user is set up and retention cleanup is done. During the cleanup emails older than 1st April will be removed and emails dated from 1st to 30th April will remain (emails from the last 30 days).
While browsing historical snapshots for backups with enabled retention, you can encounter deleted item placeholders for the items already removed by the cleanup procedure. The data and metadata for these placeholder items are deleted, but they remain present in the browsing view due to the implementation details and to provide better visibility for the retention process.
Files¶
With item-level file retention rules, all the files with modification date older than the retention period will be deleted from all historical backup snapshots. Files with creation date older that the retention period, but with newer modification date will be preserved. Files deleted in Google Workspace / Microsoft 365 that are still present in older backup snapshots are also cleaned up based on their last modified date. If for any reason files older than retention age are still present in the corresponding Google Workspace / Microsoft 365 account (although generally the best practice is to setup the same retention policies across the services used by company), they either will not be backed up at all or removed during the next cleanup procedure.
Need to mention that file cleanup job doesn't delete folders regardless of their age so it's expected that the cleanup job will remove files older than the retention age, but will not touch any folders.
How to configure and manage retention rules¶
Retention rules are configured as a backup SLA property that defines which data to backup, with what frequency and how long to keep the backup data. The following steps are required to set up custom retention rules:
- Go to the Service → Protection → SLA tab, select an existing backup SLA policy or create a new one, then choose retention mode (item-level or backup version) and retention period duration;
- After te SLA policy is set up, assign it to resources or organizational units (Google Workspace) or AAD groups (Microsoft 365) that you want to protect with this policy.
Here is the example of item-level retention set up - 1 year retention period for mail data and 6 months retention period for drive data:
Here is the example of backup version retention set up - 1 year snapshots retention period.
It is possible to configure different retention rules for different Google Workspace organizational units or for members of different Office 365 groups by configuring several backup SLA policies with different retention rules and assigning these backup SLAs to the corresponding organizational units or groups.
Retention period can be configured with a month granularity (for example, 6 months, 1 year, 3 years, etc.).
Example: the screenshot belows shows that all members of the Sales
organizational unit are protected with the Gold
backup SLA policy with 3 year email retention and all Shared Drives are protected with the Silver
SLA with 1 year document retention.