Blocklist insecure PI futexes to harden kernel
Originally created by @cypherpunks on #11524 (Redmine)
The futex system call is used for locking resources, allowing threads to play nicely together. PI (priority-inheritance) futexes are a class of futex operations which resist a situation of reduced performance called priority inversion. Unfortunately, there’s a PI-based 0day being sold by VUPEN (now Zerodium) which allows escalation to kernelmode, utilizing these futex operations. It is a race condition similar to the “Towelroot” vulnerability from 2014. I can’t get this fixed upstream, due to not having a proper fix which does not involving sacrificing the PI futexes (futex.c is exceptionally complicated). Luckily for us, Tails does not make use of these futexes (few systems do, and they tend not to be necessary anyway).
About a year ago, I first heard a rumor of this vulnerability, and
patched up my own systems by whitelisting only the futex calls which I
needed. Recently, I was able to confirm that it existed, indirectly
through a contact who actually works at VUPEN. Because of this, I wrote
an LKM which mitigates this by hooking the futex system call and causing
it to return ENOSYS
and log details to the syslog if such a banned
futex is called.
Unfortunately, Tails uses a kernel which is too old to support
livepatch, so this is the only method which seems practical, and which
has been used extensively in practice (though ironically, the methods
used are more typical of rootkits. Not many people hook syscalls for
benevolent reasons!). A possible alternative is to use kprobes, and
modify the registers of val
or futex_op
to force it into an invalid
state (which would be using official kernel APIs), but that seems a bit
sloppy. Hooking the syscall table looks like the most stable solution.
I’ve attached the kernel module source file, the makefile, and a very
simple program to test it by intentionally calling a blacklisted futex
call. The module works on both the 64bit and 32bit kernels. I ran the
module by pipacs (one of the developers of grsecurity, and the primary
developer of PaX), and his only concern was that I did not flush the TLB
after changing the r/w status of the syscall tables. I have since added
local_flush_tlb()
to the end of the relevant functions.
There are a few other unrelated issues with the Linux kernel which I have been made privately aware of, and I am still thinking of ways to deal with them.
Example usage:
root@amnesia:~/test# ls
Makefile test_futex.c vupensux.c
root@amnesia:~/test# apt-get -qq update && apt-get -qqy install build-essential linux-headers-$(uname -r)
root@amnesia:~/test# make
make -C /lib/modules/3.16.0-4-amd64/build M=/root/test modules
make[1]: Entering directory '/usr/src/linux-headers-3.16.0-4-amd64'
Makefile:10: *** mixed implicit and normal rules: deprecated syntax
make[1]: Entering directory `/usr/src/linux-headers-3.16.0-4-amd64'
CC [M] /root/test/vupensux.o
Building modules, stage 2.
MODPOST 1 modules
CC /root/test/vupensux.mod.o
LD [M] /root/test/vupensux.ko
make[1]: Leaving directory '/usr/src/linux-headers-3.16.0-4-amd64'
root@amnesia:~/test# ls
Makefile modules.order Module.symvers test_futex.c vupensux.c vupensux.ko vupensux.mod.c vupensux.mod.o vupensux.o
root@amnesia:~/test# gcc -o test_futex test_futex.c
root@amnesia:~/test# ./test_futex
FUTEX_REQUEUE: Bad address
FUTEX_LOCK_PI: Bad address
root@amnesia:~/test# insmod ./vupensux.ko
root@amnesia:~/test# ./test_futex
FUTEX_REQUEUE: Bad address
FUTEX_LOCK_PI: Function not implemented
root@amnesia:~/test# rmmod vupensux
root@amnesia:~/test# ./test_futex
FUTEX_REQUEUE: Bad address
FUTEX_LOCK_PI: Bad address
root@amnesia:~/test# sudo dmesg | tail -n 3
[26848.859975] loaded vupensux module, pi futexes are disabled
[26851.106445] from test_futex[6927], attempted to call banned pi futex with futex_op 6 and val 0
[26855.793791] unloaded vupensux module, pi futexes are enabled