Allow building IUKs in parallel (locally)
I’ve instrumented the IUK builds to verify the “
mksquashfs is likely
the bottleneck” guesstimate.
It turns out that there’s another lengthy step! That’s the
gathering the differences between both old and new contents.
Of course, one might point at the SSD and figure things would be better
from NVMe, but switching to a tmpfs almost changes nothing… Since I
don’t suppose we can do much regarding the kernel’s loopback
performances, I’ve investigated whether running several IUK builds in
parallel would make it possible to run several
rsync in parallel, each
of them being slow on its own, but meaning a smaller wallclock time.
I’ve collected data in the attached files, so that one gets an idea what
it looks like. All tests performed on an HP Z220 CMT Workstation,
equipped with Intel
~/tails-release) for the main part, and on top of a tmpfs
/scratch) on the right side. After a number of unclocked hours, I was
a little lazy and didn’t re-run the serial case to gather actual data
for the tmpfs case; but I can do that if that feels needed.
On the software side: see the
feature/parallel-iuk-builds branches in
tails.git (main repository) and in puppet-tails.git
(https://mraw.org/git/?p=puppet-tails.git); the idea is basically
using the multiprocess module to start a number of jobs in parallel (one
by default, sticking to the status quo until otherwise requested) on the
puppet-tails.git side, and adding a little instrumenting and locking on
the tails.git side.
As one would expect, the more we run
rsync in parallel, the lengthier
they get, because spawning several of them at the same time means they
influence each other. But overall, that’s leading to a smaller total
runtime: for 11 IUKs, that goes from 78m in the serial case, to 57m with
2 in parallel, to 44m with 4 in parallel. Further increasing the number
of jobs started in parallel doesn’t really help as that makes
last for really longer. And given I’m using locking to ensure a single
mksquashfs runs at a single time (having several of them compete
didn’t look like a good plan), this means a minimal
runtime that we cannot do anything about.
My conclusions so far:
- It seems that on this particular machine, using 4 jobs (50% of the logical cores) is a reasonable approach, meaning a drastically reduced wallclock time without having the machine “overcommitted” (e.g. by running all jobs in parallel). Keeping in mind that was only about 11 IUKs, that’s going to grow over time…
- I’ve prepared those changes and tested them for 4.5~rc1 and 4.5 by
amending the release process documentation to allow running a
modified version of the otherwise cloned-from-upstream
puppet-tails.git repository; and that seems to work fine. I’ve just
rebased them on top of
- I’m a little concerned regarding two things:
- The lock file was dummily created as a temporary file, at the
TMPDIR, while we have a dedicated temporary directory created for the whole job; it could probably move there.
- Contrary to say
check_po, my proof of concept doesn’t catch errors in the
multiprocess-based parallelization, and the caller needs to figure out whether all IUKs have been built; it’s arguably something that would blow up in the RM’s face soon enough, but a little extra check wouldn’t hurt. I’m not sure this should work considering merging those speed-ups though.
- The lock file was dummily created as a temporary file, at the root of
anonym, intrigeri, reviews/comments/suggestions/ACKs/NACKs
Feature Branch: feature/parallel-iuk-builds
- Related to #17435 (closed)