Changes
Page history
Document main sources of breakage and add corresponding actions
authored
Apr 08, 2021
by
intrigeri
Show whitespace changes
Inline
Side-by-side
HTTP_mirror_pool.md
View page @
a1252a8e
...
...
@@ -53,6 +53,13 @@ Maintaining the pool is:
is not resilient to broken mirrors" and "there's often a broken mirror",
any breakage impacts UX negatively
The most common sources of breakage are:
-
serves outdated data due to buggy rsync scheduling
-
TLS certificate expired (tails#17754)
-
server is down for maintenance
-
web server crashes and not restarted automatically
Additionally, slow mirrors make our monitoring take vastly longer than it could.
This makes it difficult for sysadmins to schedule properly (sysadmin#17702),
and reports error later than it could.
...
...
@@ -100,8 +107,13 @@ Improves:
Actions:
-
Incrementally remove existing unreliable mirrors from the pool: permanently
remove mirrors that had problems at least twice in the last 6 months.
-
Incrementally remove existing unreliable mirrors from the pool:
-
Permanently remove mirrors that had problems at least twice in the last 6 months.
-
Remove mirrors that expose red flags such as:
-
We're faster than the mirror operator to notice breakage on their side.
-
The mirror uses an expired TLS certificate.
-
The web server does not run under a supervisor that would restart it if it crashes.
-
Maintenance operations that take the server down are not announced in advance.
-
Add to the requirements for new mirrors: operated by a professional team, or
at least with a high-availability setup.
...
...
...
...