RBD Backups the Easy Way

Time has finally arrived to move on from rsync-based backups at C3SL. We're only a decade or two late to the party.

Reading Time: ~8 min
/img/card.jpg

Our current backup setup is not great.

It’s rsync-based. We essentially just copy the files from / into another machine. It’s possible to save a bit of space through some hard link shenanigans, but this is not relevant. It’s too complex.

Also bad: it’s an inode-based backup, meaning that if you have too many files the backups will take long as hell. Not only that, iterating over all inodes o

Database Host CPU usage for the last 2 days

This graph shows the CPU usage for our main database host in the last two days. Guess when the backups were made.

And it gets even funnier with database backups. Our backup script has a “pre-rsync” config that specifies commands to be executed before the backup is actually done, so the backup machine ssh’s into the host and performs arbitrary commands.

Wait, your backup host has root access on all hosts that should be backed up?

Yes. It’s been the norm “for more than 20 years” that our “access host” is the same as our “backup host”. I might get it for an environment with very little machines, but come on.

We need a more performant, less complex and more secure way of doing backups, at least for our virtual machines.

RBD snapshots & diffs

The neat little thing about using Proxmox with Ceph is that we’re using Rados Block Devices. And the neat little thing about RBD is that Ceph’s official docs tell you about doing incremental backups.

The main idea is: create an initial snapshot and export it, this is the base backup. For every new backup you create the next snapshot and export the diff between this one and the next. After that, the first snapshot can already be deleted.

Diagram of incremental RBD backups

We already commonly use RBD snapshots when doing risky updates on critical VMs.

New backup workflow

Recently(-ish) I’ve automated our backup config generation. We can integrate the rbd diffs with that, and have a proper rbd backup.

The plan is the following: use one of our management machines in the ceph cluster to generate the RBD diffs, export them to the backup host and then remove them. That way we can make a smoother transition into a “full rbd backup”.

The new scheme has to perform everything in a few steps:

  1. Check if backup snapshot is already present.
test -f /backups/vm-${VMID}-disk-0.${VM_NAME}.rbd2

note

We’re using both the VM’s id and name as info when storing the backup.

Using only the VM’s id is not reliable, as they might be reallocated to another VM in case the first one is deleted. That’s why we include the old VM’s name. We could even keep using other info, like DNS name, IP, whatever, too much work for little results. Name + ID is enough for our case.

If currently there’s a VM that has both the same name and id as the backup then it’s likely the same VM.

Sure, there could always be an extreme condition where they match, but that is true for our current backup setup anyways (even more so, since we only store the VM’s name).

Storing the disk’s number (like disk-0) is extremely important though.

  1. If it isn’t: perform first snapshot, export it and finish.
sudo rbd snap create vm-${VMID}-disk-0 backup-initial
# optional, but interesting, do not allow deleting this snapshot:
sudo rbd snap protect vm-${VMID}-disk-0 backup-initial
sudo rbd export-diff vm-${VMID}-disk-0@backup-initial vm-${VMID}-disk-0.${VM_NAME}.rbd2
  1. If it is: create another snapshot, export the diff, delete parent snapshot and finish.
YYYY_MM_DD=$(date '+%Y-%m-%d')
sudo rbd snap create vm-${VMID}-disk-0 ${YYYY_MM_DD}
sudo rbd snap protect vm-${VMID}-disk-0 ${YYYY_MM_DD}
sudo rbd vm-${VMID}-disk-0 vm-${VMID}-disk-0.${VM_NAME}.${YYYY_MM_DD}.rbd2
sudo rbd snap unprotect vm-${VMID}-disk-0 ${PREVIOUS_SNAPSHOT_NAME}
sudo rbd snap remove vm-${VMID}-disk-0 ${PREVIOUS_SNAPSHOT_NAME}

The date information above is up for debate, of course. We might want more than one daily backup. And we might also want to back up more than disk-0, which is commonly the / filesystem.

We can probably store the VM_NAME information in the actual backup directories, such that /backups/${VM_NAME}/vm-${VMID}-disk-0.rbd2, which also makes organization a bit simpler.

Restoring is as simple as:

# Note: we expect this disk to be deleted
# Also note: size does not matter, it will be overriden
sudo rbd create vm-${VMID}-disk-0 --size 0
sudo rbd import-diff ${DIFF_TO_BE_IMPORTED}

The only problem here is that we need to import-diff every snapshot in order. We generally limit the backup window anyways. But we can’t just delete old snapshots, specially the first, since they have intermediary data.

To do that we need to merge old snapshots with merge-diff, that way doing something like this:

sudo rbd merge-diff base.rbddiff snap1.rbdiff tmp.rbddiff
rm -f base.rbddiff snap1.rbddiff
mv tmp.rbddiff base.rbddiff

And we incorporate snapshot snap1 into the base rbd diff file.

Slap it all together

One thing I’ve learned over the years is that people do not care how things should/could be. They want to see it working. You always need at least a proof-of-concept.

Therefore, a quick proof-of-concept script, made in 30 minutes caring very little about usability and etc. would look something like the following:

#!/usr/bin/env bash

KEEP_SNAPS_WINDOW=4
BACKUP_WINDOW=6
BACKUP_ROOT='.'

VMID=$1
VM_NAME=$2

[[ -z "$VMID" ]] && echo "bobo1" && exit -1
[[ -z "$VM_NAME" ]] && echo "bobo2" && exit -1

# TODO: allow other disk numbers
RBD="vm-${VMID}-disk-0"

sudo rbd info ${RBD} > /dev/null || (echo "bobo3" && exit -1)

# If initial backup
if [[ ! -d ${VM_NAME} ]]; then
	echo "Creating initial backup..."
	mkdir ${VM_NAME}
	new_snapshot="initial"
	sudo rbd snap create ${RBD}@${new_snapshot}
	sudo rbd snap protect ${RBD}@${new_snapshot}
	sudo rbd export-diff ${RBD}@${new_snapshot} ${BACKUP_ROOT}/${VM_NAME}/${RBD}.rbddiff
	exit 0
fi

snaps=$(sudo rbd snap ls ${RBD} | awk 'NR>1{print $2}')
snap_count=$(wc -l <<< "$snaps")

if [[ $snap_count -gt $KEEP_SNAPS_WINDOW ]]; then
	echo "Removing old snapshots..."
	first_to_keep=$(tail -${KEEP_SNAPS_WINDOW} <<< "$snaps" | head -1)
	oldest_snapshot=$(sudo rbd snap ls ${RBD} | awk 'NR==2{print $2}')
	while [[ ! "x$oldest_snapshot" = "x$first_to_keep" ]]; do
		sudo rbd snap unprotect ${RBD}@${oldest_snapshot}
		sudo rbd snap remove ${RBD}@${oldest_snapshot}
		oldest_snapshot=$(sudo rbd snap ls ${RBD} | awk 'NR==2{print $2}')
	done
fi

counter=0
yyyy_mm_dd=$(date +'%Y-%m-%d')
snaps=$(sudo rbd snap ls ${RBD} | awk 'NR>1{print $2}')
if grep -q "${yyyy_mm_dd}-" - <<< "$snaps"; then
	counter=$(grep "${yyyy_mm_dd}" - <<< "$snaps" | tail -1 | sed "s/${yyyy_mm_dd}-//")
	while $(grep -q "${yyyy_mm_dd}-${counter}" - <<< "$snaps"); do
		((counter++))
	done
fi
new_snapshot="${yyyy_mm_dd}-${counter}"
most_recent_snapshot=$(sudo rbd snap ls ${RBD} | awk 'END{print $2}')

echo "Creating backup '${new_snapshot}'..."
sudo rbd snap create ${RBD}@${new_snapshot}
sudo rbd snap protect ${RBD}@${new_snapshot}
sudo rbd export-diff --from-snap ${most_recent_snapshot} ${RBD}@${new_snapshot} ${BACKUP_ROOT}/${VM_NAME}/${RBD}.${new_snapshot}.rbddiff

existing_backups=$(ls -v ${BACKUP_ROOT}/${VM_NAME}/*.rbddiff)
file_count=$(wc -l <<< "$existing_backups")
if [[ $file_count -gt $BACKUP_WINDOW ]]; then echo "Removing old backups..."
	first_to_keep=$(tail -${BACKUP_WINDOW} <<< "$existing_backups" | head -1)
	oldest_backup=$(ls -v ${BACKUP_ROOT}/${VM_NAME}/*.rbddiff | head -1)
	second_oldest_backup=$(ls -v ${BACKUP_ROOT}/${VM_NAME}/*.rbddiff | head -2 | tail -1)
	while [[ ! "x$oldest_backup" = "x$first_to_keep" ]]; do
		sudo rbd merge-diff $oldest_backup $second_oldest_backup TEMP
		rm -f $oldest_backup $second_oldest_backup
		mv TEMP $second_oldest_backup
		oldest_backup=$(ls -v ${BACKUP_ROOT}/${VM_NAME}/*.rbddiff | head -1)
		second_oldest_backup=$(ls -v ${BACKUP_ROOT}/${VM_NAME}/*.rbddiff | head -2 | tail -1)
	done
fi

This script is meant to be called using our own crontab. Meaning that what I did in the automating oldschool backups blog can be tweaked just a bit so that the crontab line just uses the new script.

Each time this particular script is called it creates an incremental backup. Not quite as IaC as I’d like, byut it is only supposed to be a proof-of-concept. I’d never use this particular setup.

note

What I mean by “not quite as IaC as I’d like” is that generally you want IaC to be something like:

  • Declare expected state
  • Run code that applies state

And a script that creates a backup itself instead of ensuring a daily backup is there is not quite the same thing. For instance, receiving the frequency for the backup as argument might be a better approach.

Just some things to think about. This can obviously be improved.

Expect an update on this btw. We are going to use rbd diff as our main backup tool for VMs (& Kubernetes PVCs) from now on.

What about database backups?

Half of this problem is already solved. We have our main database host that contains all of the critical PostgreSQL data. It’s a more powerful host specifically managed to run Postgres.

Centralized databases equals centralized database backups. For applications such as Overleaf, which use MongoDB or others, another approach is necessary.

We’re currently on the process of migrating these services to Kubernetes. In a couple of weeks or few months everything will be there.

That introduces a new layer of difficulty: the backup is not accessible via shell or remotely, like some insecure ones we found in some VMs. Also, the database access secrets are not just randomly available inside the container, they’re stored as secrets using OpenBao.

note

I just found out that our architecture is interesting enought that a dedicated post, taking a deep dive at it, might be beneficial. Both to share with someone that might find it interesting and to learn from someone eager to correct us.

Genererally it’s the case that you’ll trade off security with convenience.

File backups or volume snapshots are not suitable for database backups. Snapshotting a volume or copying files while the database is in an invalid state would make the restoration be in an invalid state. Only database dumps are suitable for that.

How can we do it in Kubernetes then?

Probably with Kubernetes Cron Jobs, but that’s work for another time. A dedicated blog post for kubernetes database backups may be more interesting, as I haven’t explored much about it just yet.

Conclusions

After migrating the backup from inode-based to zfs send/recv on our NFS /home machine, it was clear that block-based incremental backups are the superior way forward. Both for security and scalability.

Now, by migrating our 130-ish VM backups to a rbd diffs we are implementing the new scheme at scale. Significantly reducing the impact caused by the backup script (sounds funny as hell that this was a thing).

Bonus points: we can now handle more secure and performant backups at scale, better integrated with our Kubernetes deployment.