Harden Tails kernel with security-related kernel parameters
Originally created by @cypherpunks on #11143 (Redmine)
There are a few kernel parameters which can be safely added to the Tails boot command line which increase security at little to no cost, and some of which improve security pretty noticably. Here I present a few kernel parameters which can improve the security of Tails against kernel exploits, their rational, and rough cost in terms of performance, compatibility, or memory footprint. I have been adding these to Tails each time I boot manually for around a year on various machines and have never had any problem with any of them. I hope you’ll consider utilizing them to harden Tails from kernel exploits. If any additional information is needed on any of the options, I will be happy to do more research into them and provide relevant kernel code snippets if necessary.
Disables the merging of slabs of similar sizes. Many times some obscure slab will be used in a vulnerable way, allowing an attacker to mess with it more or less arbitrarily. Most slabs are not usable even when exploited, so this isn’t too big of a deal. Unfortunately the kernel will merge similar slabs to save a tiny bit of space, and if a vulnerable and useless slab is merged with a safe but useful slab, an attacker can leverage that aliasing to do far more harm than they could have otherwise. In effect, this reduces kernel attack surface area by isolating slabs from each other. The trade-off is a very slight increase in kernel memory utilization. “slabinfo -a” can be used to tell what the memory footprint increase would be on a given system.
Enables sanity checks (F) and redzoning (Z). Sanity checks are self-evident and come with a modest performance impact, but this is unlikely to be significant on an average Tails system. The checks are basic but are still useful both for security and as a debugging measure. Redzoning adds extra areas around slabs that detect when a slab is overwritten past its real size, which can help detect overflows. Its performance impact is negligible. I did consider adding the P value which enables poisoning. Poisoning writes an arbitrary value to freed objects, so any modification or reference to that object after being freed or before being initialized will be detected and prevented. This prevents many types of use-after-free vulns at little perf cost. Unfortunately, the default poison value points into userland and might make exploitation easier on systems without SMAP (aka most systems), so I excluded the P. I’ll look into it more to see if the trade-offs (increased vulnerability to dereferencing into userland memory in exchange for increased resistence to UAFs) are worth it, but until then I left it out to be safe. An additional note: any time slub_debug= is put in the kernel command line, slab_nomerge is implied. But having slab_nomerge explicitely declared can help prevent regressions where disabling of debugging features is desired but re-enabling of merging is not.
Virtual syscalls are the obsolete predecessor of vDSO calls. Unfortunately, both vsyscall=native and vsyscall=emulate (the default) have a negative security impact, with the latter a little less so. Namely, they provide a target for any attacker who has control of the return instruction pointer, which is increasingly common these days now that attackers need to resort to ROP and similar attacks which target a process’ control flow. The impact of this is with reduced compatibility, however only legacy statically compiled binaries and old versions of glibc used vsyscalls. All software on modern Tails uses vDSO instead. If for some reason a program does try to use a vsyscall, the process will crash with a memory access violation, and won’t bring the whole system down.
Mostly useful for systems with ECC memory, setting mce to 0 will cause the kernel to panic on any uncorrectable errors detected by the machine check exception system. Corrected errors will just be logged. The default is mce=1, which will SIGBUS on many uncorrected errors. Unfortunately this means malicious processes which try to exploit hardware bugginess (such as rowhammer) will be able to try over and over, suffering only a SIGBUS at failure. Setting mce=0 should have no impact. Any hardware which regularly triggers a memory-based MCE is unlikely to even boot, and the default is 1 only for long-lived servers.
Sets the kernel to fail-fast, which is highly desirable from a security-perspective (see https://en.wikipedia.org/wiki/Fail-fast for an extremely useful and succinct explaination which provides very useful reasoning). Many kernel exploits hit the kernel hard and fail many times before finally hitting the sweet spot and gaining full control over kernel space. A large percentage of these times, the failures result in a kernel oops, rather than a kernel panic. Setting oops=panic will trigger a true stop error instead. This may be problematic for machines using very buggy drivers which cause harmless oopses. These systems will simply crash. I think this is very unlikely on a Tails system though. oops=panic can also be set as a sysctl, which may be preferable because it could also allow a few other panic_on_* features to be enabled which for some reason do not have their own kernel parameters, such as panic_on_warn, panic_on_unknown_nmi, and panic_on_io_nmi. There’s also panic_on_oom which might be useful to prevent the system from locking up when memory pressure is high and not responding to a yanked out USB stick, but that’s another discussion…
Summary: slab_nomerge slightly increases memory footprint, but this shouldn’t matter for Tails because it’s not an embedded system. slub_debug=FZ increases memory footprint slightly, and has a moderate performance impact in benchmarks, but is unlikely to have any impact in the real world. Remove the “F” to remove the majority of that perf impact. vsyscall=none breaks very old apps but Tails uses none of these anyway. mce=0 prevents malicious programs from trying to exploit hardware bugs by giving them only one shot at it. oops=panic causes the system to fail-fast, which is desirable from a security perspective. Systems with very buggy drivers may crash with this option set.
Additional options I am looking into are reboot=cold (may make certain types of cold-boot attacks harder if memory is not removed from the system), acpi=copy_dsdt (may harden the system slightly from buggy BIOSes), and elevator=deadline (might reduce kernel surface area, with a nice side effect of improving USB and SSD performance). I may post rational for them as well if they turn out to be useful security-wise.
Feature Branch: feature/11143-harden-kernel