website-cache gc not good enough on jenkins
Since a few days isoworker3.dragon got its website cache disk full several times, e.g.:
- https://jenkins.tails.boum.org/job/build_Tails_ISO_19102-remove-friction-to-report-errors/9/
- https://jenkins.tails.boum.org/job/build_Tails_ISO_feature-bookworm/415/
But it has succeeded building stuff occasionally in this period too. I guess entries got old enough to be removed a few times, or something.
The log looks like this:
09:47:00 + website-cache gc
09:47:00 2024-01-26 09:47:00,285 INFO Garbage collecting expired data from the cache…
09:47:00 ++ website-cache key
09:47:00 + WEBSITE_CACHE_KEY=06cd667e57a106dcc327e97ee18879710aa907ee
09:47:00 + website-cache get 06cd667e57a106dcc327e97ee18879710aa907ee
09:47:00 Traceback (most recent call last):
09:47:00 File "/tmp/tails-build.TsnXjKeZ/auto/scripts/website-cache", line 215, in <module>
09:47:00 main()
09:47:00 File "/tmp/tails-build.TsnXjKeZ/auto/scripts/website-cache", line 112, in main
09:47:00 args.func(args)
09:47:00 File "/tmp/tails-build.TsnXjKeZ/auto/scripts/website-cache", line 135, in get
09:47:00 raise LookupError("Found no cache dir for key %s" % (args.cache_key))
09:47:00 LookupError: Found no cache dir for key 06cd667e57a106dcc327e97ee18879710aa907ee
09:47:00 + ./build-website
09:47:00 refreshing wiki..
[...]
09:59:41 building templates/popup.mdwn
09:59:41 building templates/note.mdwn
10:00:04 done
10:00:04 + website-cache put 06cd667e57a106dcc327e97ee18879710aa907ee
10:00:04 Traceback (most recent call last):
10:00:04 File "/tmp/tails-build.TsnXjKeZ/auto/scripts/website-cache", line 215, in <module>
10:00:04 main()
10:00:04 File "/tmp/tails-build.TsnXjKeZ/auto/scripts/website-cache", line 112, in main
10:00:04 args.func(args)
10:00:04 File "/tmp/tails-build.TsnXjKeZ/auto/scripts/website-cache", line 169, in put
10:00:04 shutil.copytree(src=file_to_cache, dst=cached_file, symlinks=True)
10:00:04 File "/usr/lib/python3.11/shutil.py", line 561, in copytree
10:00:04 return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
10:00:04 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10:00:04 File "/usr/lib/python3.11/shutil.py", line 515, in _copytree
10:00:04 raise Error(errors)
10:00:04 shutil.Error: [('config/chroot_local-includes/usr/share/doc/tails/website/newsletter.es.html', [...]
Increasing the capacity of the website cache disks is a temporary fix, but perhaps the best would be to make website-cache gc
smarter. Right now it removes cache entries older than 15 days, and apparently we manage to run enough builds in that period to fill the cache. Each cache entry is about 75 MB, on jenkins the website cache disks are 2.7 GB, so around 35 cache entries fit... which seems a bit low.
What if we made the garbage collection mechanism look at the available space, and deleting the oldest entries until (e.g.) 10% is free? Then the next time it will fail like this is when a cache entry takes space > 10% of 2.7 GB = 270 MB, so our website would have to more than triple in size.