Parallelize our ISO building workload on more builders
Originally created by @intrigeri on #10999 (Redmine)
The problem described in #8072 (closed) is back: quite often, ISO builds triggered by Jenkins are queuing up, and the latency between when a developer pushes to a branch, and when the resulting ISO is ready to be downloaded and automatically tested, is increasing. This situation can be explained by changes that make the build substantially slower: the move to Jessie, an added language to the website, and the Installation Assistant. We need to cope with it, somehow.
First of all, let’s note that we initially had planned to give 4 vcpus to each isobuilder, while we currently give them 8 vcpus each. IIRC we did that because we had no better use of our vcpus back then. We currently have other good use of these vcpus.
In my book, these bonus 4 vcpus only should improve the part of the build that parallelizes well, i.e. the SquashFS compression, which takes around 11 minutes these days. So:
- on our current hardware, it would be wasteful to try to improve our ISO building latency by making each individual isobuilder faster; parallelizing this workload over more VMs should work much better;
- in theory, if we give only 4 vpcus to each isobuilder, and as a result mksquashfs is twice as slow, it would only make the build last about 12.5% longer, which feels acceptable if it allows us to double the number of ISO builders we run, and in turn to solve the congestion problem we have.
At first glance, I think we should run 3 or 4 ISO builders, with 4 vpcus each. Let’s see how doable it is:
- vcpus: as explained above, we can simply reclaim some of the bonus vcpus allocated a year ago to our current isobuilders; that is, if we’re not ready to try overallocating vcpus (most of the time, all isobuilders are not used at the same time, so overallocation would make sense);
- RAM: tails#11010 (closed) gave them enough RAM for 1 or 2 more builders
- disk space: for 2 additional ISO builders, we need 2*10 GiB; we have some slack in our storage plan for this year, and we can still reclaim some space here and there, so we should be good on this side.
Blueprint: https://tails.boum.org/blueprint/hardware_for_automated_tests_take2/
Parent Task: #11009 (closed)
Related issues
- Related to #8072 (closed)
- Related to tails#9264 (closed)
- Related to #10996 (closed)
-
Blocked by tails#11010 (closed)