Remove rsync.torproject.org from the mirrors synchronization chain
_Originally created by @intrigeri on [#15162 (Redmine)](https://public-redmine-archive.tails.boum.org/code/issues/15162)_
Context: mirrors in our download pool rsync our data from
rsync.torproject.org, which itself rsyncs it from rsync.lizard, which is
not publicly available.
As we’ve noticed in the last 2 days, the rsync.torproject.org service
lacks a maintainer, and the people who maintain related services don’t
have our needs in mind. We’ve been lucky that the breakage we
experienced has been resolved promptly, thanks to weasel making time for
it on the spot, but if weasel had been AFK on holiday we would have been
in a really bad situation regarding the release.
I think the reasons why we did not want to serve our data over rsync to
mirrors are obsolete and I propose we remove it from the loop, make our
own rsync server publicly accessible and have mirror operators migrate
their cronjob from rsync.tpo to it.
Advantages:
- Our rsync sync’ chain is simplified.
- We control the entire rsync mirror sync chain. As a consequence,
improvements like tails/sysadmin#11152 easier to implement.
- We don’t have to maintain rsync.torproject.org service ourselves: as
weasel told us yesterday, it needs a maintainer if we want it to
remain up (like any TPO service as per their new policy).
- This gives us most of tails/tails#15159 for free: we already monitor that our
rsync server works.
Cons (real or potential):
- More bandwidth costs for lizard hosting (that is sponsored by Tor):
- Legit usage: currently we have 40 mirrors so each year that’s 40
\* number of ISOs we publish \* 1.2 GB. lizard has pushed 124
GiB since it was rebooted 4 days ago so I think the impact is
totally negligible => non-issue.
- Abusive usage: we already make a bunch of ISO images available
publicly over HTTPS from lizard
(<https://nightly.tails.boum.org/)> so whoever wants to pull
tons of data from that system can already do it => non-issue.
- Upload bandwidth usage peaks when publishing a new ISO: lizard will
need to upload quite some data in the hour that follows the time
when we add a new ISO to our rsync server. Assuming that’s 2 GiB per
mirror (ISO + a couple IUKs), with 40 mirrors, that’s more than what
our 100 Mbps link can sustain. Potential consequences:
- Our link is saturated and other services suffer. If this happens
we can cap the upload bandwidth of the rsync daemon (or VM,
whichever is easiest).
- It will happen that a mirror runs a second instance of the rsync
pull cronjob while the previous one has not finished yet. Oops\!
I don’t know how rsync handles this. We should wrap the
documented rsync cronjob with `flock` and have mirror operators
apply this change at the same time as they switch the rsync
server URL in that same cronjob. I’m tempted to document a
`systemd.timer` unit as well since they’re not affected by such
double-run issues, but that’s bonus and not necessary.
I volunteer to implement the needed changes (sysadmin, doc) and to
coordinate this migration with mirror operators.
Once this is completed and we’re happy with how it works, we’ll ask the
tpo admins to:
1. disable our component on their rsync *server*
2. adjust their rsync *client* cronjob to work just like every other
mirror’s
… and then we can add their mirror to the pool.
### Related issues
- **Related to** tails/tails#15159
- **Related to** tails/tails#15687
- **Blocks** tails/sysadmin#11152
issue