Allow building IUKs in parallel (locally)

Originally created by @CyrilBrulebois on #17657 (Redmine)

I’ve instrumented the IUK builds to verify the “mksquashfs is likely the bottleneck” guesstimate.

It turns out that there’s another lengthy step! That’s the rsync gathering the differences between both old and new contents.

Of course, one might point at the SSD and figure things would be better from NVMe, but switching to a tmpfs almost changes nothing… Since I don’t suppose we can do much regarding the kernel’s loopback performances, I’ve investigated whether running several IUK builds in parallel would make it possible to run several rsync in parallel, each of them being slow on its own, but meaning a smaller wallclock time.

I’ve collected data in the attached files, so that one gets an idea what it looks like. All tests performed on an HP Z220 CMT Workstation, equipped with Intel® Xeon® CPU E3-1245 V2 (3.40GHz), on top of an SSD (~/tails-release) for the main part, and on top of a tmpfs (/scratch) on the right side. After a number of unclocked hours, I was a little lazy and didn’t re-run the serial case to gather actual data for the tmpfs case; but I can do that if that feels needed.

On the software side: see the feature/parallel-iuk-builds branches in tails.git (main repository) and in puppet-tails.git (https://mraw.org/git/?p=puppet-tails.git); the idea is basically using the multiprocess module to start a number of jobs in parallel (one by default, sticking to the status quo until otherwise requested) on the puppet-tails.git side, and adding a little instrumenting and locking on the tails.git side.

As one would expect, the more we run rsync in parallel, the lengthier they get, because spawning several of them at the same time means they influence each other. But overall, that’s leading to a smaller total runtime: for 11 IUKs, that goes from 78m in the serial case, to 57m with 2 in parallel, to 44m with 4 in parallel. Further increasing the number of jobs started in parallel doesn’t really help as that makes rsync last for really longer. And given I’m using locking to ensure a single mksquashfs runs at a single time (having several of them compete didn’t look like a good plan), this means a minimal mksquashfs runtime that we cannot do anything about.

My conclusions so far:

It seems that on this particular machine, using 4 jobs (50% of the logical cores) is a reasonable approach, meaning a drastically reduced wallclock time without having the machine “overcommitted” (e.g. by running all jobs in parallel). Keeping in mind that was only about 11 IUKs, that’s going to grow over time…

I’ve prepared those changes and tested them for 4.5~rc1 and 4.5 by amending the release process documentation to allow running a modified version of the otherwise cloned-from-upstream puppet-tails.git repository; and that seems to work fine. I’ve just rebased them on top of stable (tails.git) and master (puppet-tails.git).

I’m a little concerned regarding two things:
- The lock file was dummily created as a temporary file, at the root of TMPDIR, while we have a dedicated temporary directory created for the whole job; it could probably move there.
- Contrary to say check_po, my proof of concept doesn’t catch errors in the multiprocess-based parallelization, and the caller needs to figure out whether all IUKs have been built; it’s arguably something that would blow up in the RM’s face soon enough, but a little extra check wouldn’t hurt. I’m not sure this should work considering merging those speed-ups though.

Dear anonym, intrigeri, reviews/comments/suggestions/ACKs/NACKs welcome.

Feature Branch: feature/parallel-iuk-builds

Attachments

Related issues

Related to #17435 (closed)

Edited May 21, 2020 by Cyril 'kibi' Brulebois

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information