Doing 3 years of GitLab upgrades in a few hours

You read that right. What could go wrong? Jumping 3 major releases in a day with a ton of people heavily depending on the service is totally a good idea.

Reading Time: ~15 min
card.jpg

Our GitLab instance was so outaded that the last 2 critical security issues I can recall didn’t even affect us.

I’m talking about these:

16.8.1 and 15.11.2.

This one did affect us though: 15.3.2.

But we just pretended that didn’t happen, so it’s fine.

Not that the security would be too big of an issue anyways. We disabled registration so only people that have been given access could mess around mostly. However, since we do have a lot of stuff to do we’ve never really taken too long looking for vulnerabilities.

Anyways, there are absolutely security issues there, we’ve just ignored them and no one bothered us with that.

Why we haven’t upgraded yet

You might have seen the banner all the way up there: we’re at version 14.10.0, and at the time of writing this the latest stable is 17.8.?. That’s pretty wild.

To put that into perspective, GitLab 17.8 was released january (16th) 2025, while 14.10.0 was released all the way back in april (22nd) 2022. So we’ve pretty much reached 3 years of zero upgrading.

Which is kind of funny, since our PostgresSQL instance is up to date (16.6 at the time of writing).

“The GitLab Instance’s PostgreSQL version”

Note

That’s because it’s an external database, and we have a centralized PostgreSQL server architecture with a dedicated database management team. So generally it’ll be up to date.

As for the reason to not upgrade: Kerberos.

Kerberos & OpenID Connect

The Department of Informatics at UFPR’s infrastructure is heavily inspired by MIT’s. A simplified explanation of this model is that people get their own HOME directories and users in a central system, and then access their stuff either via SSH or a remote-booted computer at our labs.

In any case, one of the things we use to authenticate, since we got our userbase on LDAP in a Debian machine is: Kerberos.

GitLab version 15.0 dropped support for kerberos authentication using passwords. The only allowed login method with kerberos was directly using the ticket.

So that’s it. Someone probably said something like

I guess we won’t update this time.

And three years have passed

OpenID Connect

In an effort to upgrade most of our infrastructure we started using Keycloak for identity and access management. Connected it to LDAP, and it does the kerberos auth behind the scenes. However: it provides us with another protocol which is way more convenient to use: OpenID Connect.

In short: it wraps around OAuth2 and does some heavy lifting for us. You can read how it works in the official docs, it’s explained in less than 150 words.

Never do this.

Sure, sometimes it may be necessary to wait to update something because a feature you liked was dropped. That’s no reason for a 3-year update hiatus.

If you’re going over a few weeks, maybe even months, without updating just look for alternatives or write your own patch. Hell, our Kerberos login page was a patch, and GitLab already knows how to deal with tickets, all you need is a patch to get a ticket from the login screen… come on…

Seriously, unless using some other tech fucks your entire architecture or the patch is unreasonably complex to make: there is no reason to not try to fix that.

Doing nothing will bite you in the ass later.

That said, truth is most software experience will be with outdated stuff, so maybe I’m just wrong and we should never update things.

Either way, sinc we’re migrating a bunch of stuff to OpenID Connect now, I get to upgrade GitLab 11 times today.

The Upgrade Plan

Assuming a lot of assumptions, the upgrade should go as follows:

  1. Before anything, snapshot the VM and dump the database (worst case scenario we rollback the VM snapshot, stop all GitLab services, restore the database and restart GitLab).
  2. Follow the update guide to the next version.
  3. Verify stuff didn’t break, repositories are still there, and etc.
  4. Repeat for every version jump (eg. 14.10.0 to 14.10.5, then 14.10.5 to 15.0.5 and so on).

Because you can’t/shouldn’t upgrade directly from 14.10 to 17.8 we’ll need to perform multiple upgrades, and check everything is ok before going to the next version jump.

Hell, there’s even a pretty nice tool that tells you which stops you’ll need to make. My upgrade path required 11 stops, so we’ll upgrade GitLab 11 times today…

It’s not even like I can document something, GitLab’s upgrade docs are unreasonably good. Probably the best I’ve seen when it comes to upgrading software.

Post-upgrade note:

Matter of fact, all of the problems I’ve experienced during this crazy upgrade were either easy to solve if you know what you’re doing, or they were well documented by GitLab itself.

I won’t snapshot every version upgrade, it takes around 11 minutes to make a single snapshot, with 11 upgrades that’s 2 hours of waiting for snapshots. So If I end up doing too much stuff or I know it may go wrong real easy I’ll snapshot. Probably will end up making only 2 or 3.

Thankfully, it’s somewhat predictable what the upgrade process will look like, because it’s well documented, and from version 15 onwards I expect stuff to go smoothly.

Post-upgrade note:

Ooooh how wrong the last statement was.

Step 3 is better explained later when I actually do it in the first major version upgrade.

Dew it

It’s saturday afternoon, I’ve already put a message in our instance for the past week warning that during the weekend the service would be unavailable, so theoretically I get 2 days of messing around. I plan to finish this today though, I kinda already am starting this almost at 17:00, so hopefully it doesn’t take that long.

For the next “I don’t know how many” hours I’ll be staring at this screen, as well as Firefox on a second monitor for easy problem searching:

Base work windows

Do not criticize my wm (waybar mostly) plz, I copied the config from someone at some point. I just switched to NixOS+Hyprland from Gentoo+dwm in an effort to try new things and it still feels funky to use this.

v14.10.0 -> v14.10.5

Running pre-upgrade checks already hit me with a surprise:

# stuff...
All projects are in hashed storage? ... no
  Try fixing it:
  Please migrate all projects to hashed storage
  as legacy storage is deprecated in 13.0 and support will be removed in 14.0.
  For more information see:
  doc/administration/repository_storage_types.md
# more stuff...

Apparently not even the upgrade from version 13 to 14 was truly complete. That’s… annoying…

What was even more annoying was the solution (running gitlab-rake gitlab:storage:migrate_to_hashed) does not work, because the support was dropped in version 14 already.

It makes sense though. If there has been no care for upgrading for 3 years already, why would it be so different before?

That’s good though, because I can just remove these repositories. They’re not accessible anymore and have not been for a while.

For instance, this command lists all repositories that were not migrated to the “new” hashed storage:

root@gitlabprod:~# gitlab-rake gitlab:storage:list_legacy_projects

* Found 5 projects using Legacy Storage
  - [username]/config-files (id: 2440)
    [username]/config-files
  - [username]/gitTeste (id: 2943)
    [username]/gitTeste
  - [username]/TrabalhoSoftwareBasico (id: 3067)
    [username]/TrabalhoSoftwareBasico
  - [username]/dfgdfgdg (id: 3085)
    [username]/dfgdfgdg
  - [username]/teste123 (id: 3086)
    [username]/teste123
root@gitlabprod:~#

If you search the repository config-files you’ll not find it:

Legacy repo not appearing on project search.

Funny enough though, it shows up in the user projects page:

Legacy repo appearing on user projects.

But no, you can’t access it anymore:

Legacy repo inaccessible.

The problem is: if I can’t access the project, how to remove it?

That’s where some Ruby comes in: we use the GitLab Rails console.

Warning

As the docs say well enough:

Commands that change data can cause damage if not run correctly or under the right conditions. Always run commands in a test environment first and have a backup instance ready to restore.

I’m not that responsible though. The point is to lose data here.

(I actualy already made a checkpoint in case i need to restore the system all the way back, but this sounded funny)

Get into the GitLab rails console with gitlab-rails console, and something like the following will destroy a repository for you:

project = Project.find_by_full_path('<project_path>')
user = User.find_by_username('<username>')
ProjectDestroyWorker.new.perform(project.id, user.id, {})

After deleting each one individually we get no legacy repos:

root@gitlabprod:/var/opt/gitlab/git-data/repositories# gitlab-rake gitlab:storage:list_legacy_projects
* Found 0 projects using Legacy Storage
root@gitlabprod:/var/opt/gitlab/git-data/repositories#

And then the pre-upgrade checks also run smoothly.

Finally it’s upgradin time. apt install gitlab-ce=14.10.5-ce.0

It went smoothly. Only errors were about pg_dump and the PostgreSQL server version mismatch, but that was expected, as one is updated and the other one is not. And expecting that I already made a database dump before, so nothing to worry about.

v14.10.5 -> 15.0.5

Whatever could go wrong here was already dealt with in the previous upgrade. I just ran:

gitlab-rake gitlab:check
apt install gitlab-ce=15.0.5-ce.0

Aaaaannnd iiiit… didn’t work…

Remember kerberos? I’m gonna crop the 50-line error with the relevant part here:

Devise::OmniAuth::StrategyNotFound: Could not find a strategy with name `Kerberos'. Please ensure it is required or explicitly set it using the :strategy_class option.

Kerberos would break an update eventually because that’s the reason this GitLab instance was not updated to begin with: support for password-based login with kerberos was dropped.

Just had to edit /etc/gitlab/gitlab.rb, remove 'kerberos' from any omniauth config, run gitlab-ctl reconfigure and try upgrading again.

Now it worked. However, because it’s a major version jump, I’ll perform the whole “post-upgrade checks”. I’m not writing all of this again, but assume I did it for every major upgrade.

Shortly, the steps are:

  1. Run gitlab-rake gitlab:check (check output, obviously).
  2. Run gitlab-rake gitlab:doctor:secrets (check output, obviously).
  3. Verify that users can sign in.
  4. Verify projects & related stuff appear.
  5. Verify pushing code works.
  6. Verify that CI/CD runners work (I’m not gonna verify pushing images, I’ll assume that if pulling works pushing will work too).

Normally you should read the whole release notes and stuff and etc. etc. etc.

I have 3 years worth of those to read so I’m not going to, even though I (more than most) should.

Do as I say, not as I do.

Be responsible. Read your release notes.

Upgraded to 15.0.5!

Fun fact: users can’t sign in for the most part, because everyone was using kerberos, so I added our OpenID Connect config there, and it worked smoothly, thankfully.

15.0.5 -> 15.4.6

Whatever could go wrong here was already dealt with in the previous upgrade. I just ran:

gitlab-rake gitlab:check
apt install gitlab-ce=15.4.6-ce.0

And it worked :)

You get to be happy sometimes.

15.4.6 -> 15.11.13

gitlab-rake gitlab:check
apt install gitlab-ce=15.11.13-ce.0

Worked smoothly twice in a row?? Nice.

HAHAHAHAHAHAHAHAHAHHAH

root@gitlabprod:~# apt install gitlab-ce=15.11.13-ce.0
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Version '15.11.13-ce.0' for 'gitlab-ce' was not found
root@gitlabprod:~#

Not yet. It seems we also gotta upgrade Debian already:

# for reference: LTS today is 12.9, so the VM itself wasn't updated for nearly 3 years
root@gitlabprod:~# cat /etc/debian_version
10.13
root@gitlabprod:~#

The apt update fails miserably too:

root@gitlabprod:~# apt update
Hit:1 http://debian.c3sl.ufpr.br/debian buster InRelease
Ign:2 http://debian.c3sl.ufpr.br/debian buster-backports InRelease
Hit:3 http://debian.c3sl.ufpr.br/debian buster-updates InRelease
Get:4 http://debian.c3sl.ufpr.br/debian-security buster/updates InRelease [34.8 kB]
Err:5 http://debian.c3sl.ufpr.br/debian buster-backports Release
  404  Not Found [IP: 2801:82:80ff:8000::4 80]
Get:6 https://packages.gitlab.com/gitlab/gitlab-ce/debian buster InRelease [23.3 kB]
Err:6 https://packages.gitlab.com/gitlab/gitlab-ce/debian buster InRelease
  The following signatures were invalid: EXPKEYSIG 3F01618A51312F3F GitLab B.V. (package repository signing key) <packages@gitlab.com>
Reading package lists... Done
E: The repository 'http://debian.c3sl.ufpr.br/debian buster-backports Release' no longer has a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://packages.gitlab.com/gitlab/gitlab-ce/debian buster InRelease: The following signatures were invalid: EXPKEYSIG 3F01618A51312F3F GitLab B.V. (package repository signing key) <packages@gitlab.com>
root@gitlabprod:~#

Anyways, the classic:

  1. Get new keys for the GitLab apt repository (generally projects this large will give you the full command, some sort of curl <something> | <something else> && <whatever>).
  2. Update sources.list.
  3. apt update
  4. apt upgrade
  5. reboot
  6. Hope for the best

For reference, this is the sequence of commands that fixed the problem above:

# remove trashed key
rm /etc/apt/sources.list.d/gitlab_gitlab-ce.list
# update debian repos (from buster to bullseye)
vim /etc/sources.list
# get new key
curl https://packages.gitlab.com/install/repositories/gitlab/gitlab-ce/script.deb.sh | bash
# verify apt update works normally
apt update

If I find a remotely interesting way to explain Linux server upgrading I’ll make a proper “tip” blog on updating stuff.

After all that, we try again:

gitlab-rake gitlab:check
apt install gitlab-ce=15.11.13-ce.0

If you’re paying attention still (congratulations btw, this is boring af) you’ll notice that I didn’t apt upgrade and reboot.

Turns out that after updating the repositories apt will want the gitlab-ce package to be updated all the way to version 17.5. There’s two solutions for that:

  1. The boring way: Telling apt to not upgrade the gitlab-ce package and doing it properly.
  2. The fun way: Not actually rebooting, going full YOLO with a half upgraded debian bullseye and hoping stuff works.

I’m gonna let you guess which one I chose.

Check that stuff works and be happy for now.

Caution

Don’t actually do that. Nothing broke, but it could. Debian upgrades either go very smoothly or very wrong, so don’t make it easy for it to go wrong.

I had to reboot the VM anyways one or two upgrades down the line because ssh died and wouldn’t come back (probably replaced a lib that I shouldn’t have).

15.11.13 -> 16.3.9

Another major version upgrade… yay!!!

gitlab-rake gitlab:check
apt install gitlab-ce=16.3.9-ce.0

It bothers me that this update was this simple and the other one required upgrading Debian and messing with the apt sources.

The UI changed in this version, look how nice:

New GitLab UI

One cool thing about doing all these upgrades at once is actually seeing the startup time for the GitLab services increasing.

It started becoming very noticeable with versions 16.x.

16.3.9 -> 16.7.10

gitlab-rake gitlab:check
apt install gitlab-ce=16.7.10-ce.0

Aaaand it’s there.

Question

Given the other problems we’ve already faced, is there really anything that can go wrong from now on?

Only thing that comes to mind is some DB migration failure.

16.7.10 -> 16.11.10

gitlab-rake gitlab:check
apt install gitlab-ce=16.11.10-ce.0

Heyyyyy, 3-streak on

16.11.10 -> 17.1.8

FIIIIINALLY the last major version for today.

gitlab-rake gitlab:check
apt install gitlab-ce=17.1.8-ce.0

I did have to touch /etc/gitlab/skip-auto-backup because of a version mismatch between our postgres server and the gitlab postgres client,:

(...) pg_dump: error: server version: 16.6 (Debian 16.6-1.pgdg120+1); pg_dump version: 14.11

I’ll still count this as a two-command upgrade.

Gimli

I’m enjoying this. If all upgrades went like this we could do 10 years of stalled upgrades in a single day.

Don’t do that.

The sidebar UI changed slightly again, nothing that noticeable though. Or maybe I’m just going crazy staring at a terminal and a GitLab page for about 5 hours already.

17.1.8 -> 17.3.7

gitlab-rake gitlab:check
apt install gitlab-ce=17.3.7-ce.0

Remember when I said that the only thing that still could go wrong was a migration failing? Yeah…

For some weird reason in the middle of the apt install ... the gitlab-ctl reconfigure command failed. I remember reading “somethingsomething migration failure somethingelse”.

But then I ran it manually to debug the error and the error didn’t happen.

Re-ran apt install and it worked (?????).

Not gonna complain, nothing seems broken.

Post-upgrade note:

The next migrations worked fine, nothing in GitLab seems broken, so no clue what happened there.

Maybe it’s the universe itself messing with me.

Maybe it didn’t fail at all.

I’m seeing things.

I see you.

I am inside your walls.

17.3.7 -> 17.5.5

gitlab-rake gitlab:check
apt install gitlab-ce=17.5.5-ce.0

Thank god.

17.5.5 -> 17.8.1

Aaaand finally:

sed -i 's/bullseye/bookworm/g' /etc/apt/*
sed -i 's/bullseye/bookworm/g' /etc/apt/sources.list.d/*
apt update
apt upgrade
apt dist-upgrade
reboot

Wait what?

Yeah, the final upgrade I did with the VM’s OS itself. Pretty neat. Would be a shame if the gitlab upgrade failed silently or something, but apparently it did not.

Sure, there was /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_2.34' not found error I had to solve, also had to run gitlab-ctl reconfigure twice, but you don’t wanna hear about that. I sure as hell didn’t want to.

It’s actually funny, I’ve seen this specific GLIBC error a few times, it doesn’t even seem weird anymore. Sure most programs break and stuff, but it’s one of those things that you realise are not as daunting as they seem to be, specially when you’re forced to face them a couple of times.

If you take anything away from this post let it be not to rush updates (ironically, because I’m having this problem because of the lack of updating).

Note

It’s kinda funny that GitLab takes so much longer to boot that the “502 page” kept changing.

Back in 14.10.0 it said something like “GitLab is not responding”. Now in 17.8.1 it says “GitLab is booting up”, because it’s expected for it to take too long to boot.

New 502 screen.

Also remembered to re-enable auto backups, as pg_dump should now match our PostgreSQL server version:

rm /etc/gitlab/skip-auto-backup

It’s 23:40 right now. I started this sometime after 16:00.

3 years of updates in roughly 8 hours.

Up to date GitLab.

Final thoughts

Don’t do this.

Keep your stuff up to date. Do not keep ignoring the “Update ASAP” message, come on. Having to stop whatever you’re doing (which is most likely more important) to keep updating services you offer is extremely annoying.

Oh btw, we have to upgrade our Overleaf, Jitsi, BigBlueButton, Moodle, GitLab, Metabase, … instances. I think we have a pretty nice window of time right now to do this.

And other terrifying things a sysadmin has nightmares about.

We did

3 years worth of updates in an 8-hour span.

11 obligatory upgrade stops.

2 major Debian version upgrades.

Don’t get me wrong, I’m very experienced in this kind of stuff. I had a very simple and well defined rollback plan in mind, and I also had already made the upgrade in a test environment, that’s why I took some freedom to not do it as “formally” as I should.

That said, always follow a very well defined and formal plan for upgrades such as this one. Literally everyone here uses this GitLab instance, some projects that pay us are there, so being a little more careful is better.

Makes for a very interesting “tech horror” blog title at least.

Post-upgrade note:

The service is up and running, apparently nothing was lost. I’ll only know for sure a few days/weeks from now when everyone uses it for a while.

This is interesting though, GitLab’s update docs are extremely good. Might use them as reference in future projects.

Went into this task fully expecting to have to rollback everything once or twice, given how informally I did the upgrades, but it actually worked. I’m almost confused.