The build of our production website should be self-healing
In a variety of situations, an ikiwiki refresh triggered by a Git push fails, leaving it in an unclean state, and then the only way to recover is to ssh into the machine and manually start a full rebuild. This is painful because:
- When this happens during a release process, the release can be left half-published, until someone fixes this. That’s not fun for the RM.
- It puts timing/availability/expectations pressure on sysadmins.
- I suspect our technical writers have grown wary of pushing some kinds of changes that typically trigger this sort of problems. Not being able to do one’s job with a reasonable amount of confidence in oneself and in our infra is surely not fun.
Ideally, somehow our infra would notice this situation and run a full rebuild itself.
- Related to tails#17361 (closed)