Build our production website in GitLab CI
Scope
In scope
- build our live, production website via a GitLab CI job, and then deploy the output to webserver(s) upon success
- by default, use caching +
ikiwiki --refresh
to avoid a huge time-to-publication increase - developers and tech writers can force a full rebuild of the website, that bypasses/invalidates the cache, via GitLab CI
Out of scope
This issue is not about serving our website via GitLab pages.
Expected benefits
More robust
- the build happens in a controlled, mostly reproducible environment, so problems caused by transition between states are less of a problem
- the output of the build is published only if it succeeded ⇒ no partly refreshed, half broken website in production
- Avoid problems caused by incorrect state transitions like Deleted page still served and indexed by search... (#18065 - closed)
Non-sysadmins have more agency about their work
- everyone can look at the build output: not only the person who pushed, but also the person who should investigate and debug what happened
- developers can fix stuff themselves via the GitLab CI config file, if needed
- developers and tech writers can maintain the configuration themselves (
ikiwiki.setup
, ikiwiki plugins, build dependencies such aspo4a
(Upgrade to po4a 0.62 (tails#18667 - closed) and the upcoming tails#20239))- no need to maintain changes in 2 different versions (tails.git, puppet-tails)
- no need to coordinate merging branches with deploying updated configuration on the production infra
Recover from broken website refresh/build without sysadmin intervention
In a variety of situations, an ikiwiki refresh triggered by a Git push fails, leaving it in an unclean state, and then the only way to recover is to ssh into the machine and manually start a full rebuild. This is painful because:
- When this happens during a release process, the release can be left half-published, until someone fixes this. That’s not fun for the RM.
- It puts timing/availability/expectations pressure on sysadmins.
- I suspect our technical writers have grown wary of pushing some kinds of changes that typically trigger this sort of problems. Not being able to do one’s job with a reasonable amount of confidence in oneself and in our infra is surely not fun.
Paves the way towards web server redundancy
Context: #16956 (closed)
E.g. here's how Tor is doing it: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/static-shim#deploying-a-static-site-from-gitlab-ci
Examples: https://gitlab.torproject.org/tpo/web/tpo/-/blob/main/.gitlab-ci.yml?ref_type=heads and https://gitlab.torproject.org/tpo/web/blog/-/blob/main/.gitlab-ci.yml?ref_type=heads
And an example deployment: https://gitlab.torproject.org/tpo/web/tpo/-/jobs/496878
Originally created by @intrigeri on #17364 (Redmine)
To-do
-
Build the website in GitLab CI and push it to www2
→ tails!1519 (merged) -
Fix Website builds in GitLab CI sometimes timeout a... (#18086 - closed) -
Prevent jobs corresponding to older commits from overwriting newer versions of the website (see thread below) -
Use our own container image to build the website -
Pin our GitLab's container registry IP in /etc/hosts of gitlab-runner VMs -
Figure out how to feed Ikiwiki's PO files update back to tails.git
-
Push the website to www
(somehow) and retiretails::website
-
Test changing source string so IkiWiki pushes updated .po
files back to the repo -
Check if there's a better access-token setup than the current one re. needed permissions and expiration time -
Document accordingly -
Push to www2
via the private network and remove the public access to that VM's SSH service