Build our production website in GitLab CI
[[_TOC_]]
# Scope
## In scope
- build our live, production website via a GitLab CI job, and then deploy the output to webserver(s) upon success
- by default, use caching + `ikiwiki --refresh` to avoid a huge time-to-publication increase
- developers and tech writers can force a full rebuild of the website, that bypasses/invalidates the cache, via GitLab CI
## Out of scope
This issue is *not* about _serving_ our website via GitLab pages.
# Expected benefits
## More robust
* the build happens in a controlled, mostly reproducible environment, so problems caused by transition between states are less of a problem
* the output of the build is published only if it succeeded ⇒ no partly refreshed, half broken website in production
* Avoid problems caused by incorrect state transitions like tails/sysadmin#18065+
## Non-sysadmins have more agency about their work
* everyone can look at the build output: not only the person who pushed, but also the person who should investigate and debug what happened
* developers can fix stuff themselves via the GitLab CI config file, if needed
* developers and tech writers can maintain the configuration themselves (`ikiwiki.setup`, [ikiwiki plugins](https://gitlab.tails.boum.org/tails/tails/-/merge_requests/756#note_190808), build dependencies such as `po4a` (tails/tails#18667+ and the upcoming tails/tails#20239))
- no need to maintain changes in 2 different versions (tails.git, puppet-tails)
- no need to coordinate merging branches with deploying updated configuration on the production infra
## Recover from broken website refresh/build without sysadmin intervention
In a variety of situations, an ikiwiki refresh triggered by a Git push
fails, leaving it in an unclean state, and then the only way to recover
is to ssh into the machine and manually start a full rebuild. This is
painful because:
- When this happens during a release process, the release can be left
half-published, until someone fixes this. That’s not fun for the RM.
- It puts timing/availability/expectations pressure on sysadmins.
- I suspect our technical writers have grown wary of pushing some
kinds of changes that typically trigger this sort of problems. Not
being able to do one’s job with a reasonable amount of confidence in
oneself and in our infra is surely not fun.
## Paves the way towards web server redundancy
Context: tails/sysadmin#16956
E.g. here's how Tor is doing it: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/service/static-shim#deploying-a-static-site-from-gitlab-ci
Examples: https://gitlab.torproject.org/tpo/web/tpo/-/blob/main/.gitlab-ci.yml?ref_type=heads and https://gitlab.torproject.org/tpo/web/blog/-/blob/main/.gitlab-ci.yml?ref_type=heads
And an example deployment: https://gitlab.torproject.org/tpo/web/tpo/-/jobs/496878
_Originally created by @intrigeri on [#17364 (Redmine)](https://public-redmine-archive.tails.boum.org/code/issues/17364)_
# To-do
- [x] Build the website in GitLab CI and push it to `www2` → tails!1519
- [x] Fix #18086+
- [x] Prevent jobs corresponding to older commits from overwriting newer versions of the website (see [thread below](https://gitlab.tails.boum.org/tails/sysadmin/-/issues/17364#note_236043))
- [x] Use our own container image to build the website
- [x] Pin our GitLab's container registry IP in /etc/hosts of gitlab-runner VMs
- [x] Figure out how to feed Ikiwiki's PO files update back to `tails.git`
- [x] Push the website to `www` (somehow) and retire `tails::website`
- [x] Test changing source string so IkiWiki pushes updated `.po` files back to the repo
- [x] Check if there's a better access-token setup than the current one re. needed permissions and expiration time
- [x] Document accordingly
- [x] Push to `www2` via the private network and remove the public access to that VM's SSH service
issue