[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Publicity] Technical / puzzle blog post on killing processes
Below is a write-up of an investigation we went into as a result of the QEMU depriv work. Web searching actually found several people asking how this could be done, but nobody having any good answers. I was thinking of sending this to LWN; it's the sort of quirky technical puzzle that their readers seem to enjoy. Otherwise, I think we should post it to the Xen blog for the next person who wants to do something like this. I have proof-of-concept code for most of this; I could also make a project on github (or gitlab) and link to it. This is written in pandoc markdown; the proper conversion rune is `pandoc -s -o blog.html [filename]`. Let me know if you have any feedback. -George % Killing processes that don't want to be killed Suppose you have a program running on your system that you don't quite trust. Maybe it's a program submitted by a student to an automated grading system. Or maybe it's a QEMU device model running in a Xen "domain 0", and you want to make sure that even if an attacker from a rogue VM manages to take over the QEMU process, she can't do any further harm. There are many things you want to do as far as restricting its ability to do mischief. But one thing in particular you probably want to do is to be able to reliably kill the process once you think it should be done. This turns out to be quite a bit more tricky than you'd think. # Avoiding kill with fork So here's our puzzle. Suppose we have a process that we've run with its own individual user id (`target_uid`), which we want to kill. But the code in the process is currently controlled by an attacker who doesn't want it killed. We obviously know the pid of the initial process we forked, so we could just use the `kill` system call: ~~~ kill(target_pid, 9); ~~~ So how can an attacker avoid this? It turns out to be pretty simple: ~~~ while(1) { if(!fork()) _exit(0); } ~~~ This simple snippet of code will repeatedly call `fork`. As you probably know, `fork` returns twice: once in the existing parent process, and once in a newly-created child process. The result is effectively that the process races through the process ID space as fast as the kernel will let it. I encourage you to run the above code snippet (preferrably in a VM), and see what it looks like. It's not even very noticeable. Running `top` shows a system load of about 50% (in my VM anyway), but there's not obviously any particular process contributing to that load; everything is still very responsive and functional. If you didn't know about it, you might never notice it was there. Now try killing it. You can run `killall` to try to kill the process by name, but it will frequently fail with "no process killed"; and even when it succeeds, it often turns out that you've killed the _parent_ process after the `fork` but before the `exit`, so the rogue forker is still going strong. Even determining whether you've managed to kill the process or not is a challenge. The basic problem here is a race condition. What `killall` does is: 1. Read the list of processes 2. Call `kill(pid, sig)` on each one In between 1 and each instance of 2, the kernel tasklist lock is released (since it has to return from the hypercall), giving the rogue process a chance to fork. Indeed, it has many chances; since the second one takes a non-negligible amount of time, by the time you manage to find the rogue process, it's likely already forked, and perhaps even exited. It's true, if we ran `killall` 1000 times, it would very likely end up dead; and if we ran `ps` 1000 times, and found no trace of the process, we might be pretty sure that it was gone. On the other hand, that assumes that the "race" is fair, and that the attacker hasn't discovered some way of making sure that the race ends up going her way. It would be best if we didn't rely on these sorts of probabilistic calculations to clean things up. # A better mousetrap: Preventing forks? One thing to do, of course, would be to try to prevent the process from executing `fork` in the first place. This could be done on Linux using the `seccomp2` call; but it's Linux-specific. (Xen, in particular, wants to be able to support NetBSD and FreeBSD dom0's, so we can't rely on this for correctness.) Another would be to use the `rlimit` system call and set `RLIMIT_NPROC` to `0`. This should, in theory, prevent this process from calling `fork` (since by definition there would already be one process with its UID running). But even `RLIMIT_NPROC` has had [issues in the past](https://lwn.net/Articles/451985/). Surely there must be a way to kill a process in a way that it can't evade, without relying on being able to take away `fork`? # A better mousetrap: Process groups? Looking more closely at the `kill` man page, it turns out that the `pid` argument can be interpreted in four possible ways: * `pid` > 0: `pid` of a single process to kill * `pid` < -1: `pgid` of a _process group_ to kill * `pid` == 0: Kill every process in my current process group * `pid` == -1: Kill every process that I'm allowed to kill At first glance it seems like killing by `pgid` might do what we want. To run our untrusted process, set the PGID and the UID, and to kill it, we call `kill(-target_pgid, 9)`. Unfortunately, unlike the user id, unprivileged processes are explicitly allowed to change their `pgid`. So our attacker could simply run something like the following to avoid being killed in the same way: ~~~ while(1) { if(!fork()) _exit(0); setpgid(0, 0); } ~~~ In this case, the child process changes its PGID to match its PID as soon as it forks, making `kill(-target_pgid)` as racy as `kill(target_pid)`. # A better mousetrap: kill -1 Ok, what about the last one -- "kill every process I'm allowed to kill"? Well we obviously don't want to run that as root unless we want to nuke the entire system; we want to limit "all processes I'm allowed to kill" to the particular uid we've given to the rogue process. Well in general processes are allowed to kill other processes with their own uid; so what about something like the following? ~~~ setuid(target_uid); kill(-1, 9); ~~~ (NB that for simplicity sake I will omit error handling in these examples; but when playing with `kill` you should certainly make sure that you did switch your `uid`!) The `kill` system call, when called with `-1`, will loop over the entire task list, attempting to send the signal to each process except the one making the system call. The `task_list` lock is held for the entire loop, so the rogue process cannot complete a `fork`; and since the `uid`s match, it will be killed. Done, right? Not quite. If we simply call `setuid`, then not only can we kill the rogue process, but the rogue process can also kill us: ~~~ while(1) { if(!fork()) _exit(0); kill(-1, 9); setpgid(0, 0); } ~~~~ If the rogue process manages to get its own `kill(-1)` in after we've called `setuid` but before we've called `kill` ourselves, _we_ will be the ones to disappear. So to successfully kill the rogue process, we still need to win a race -- something we'd rather not rely on. # A better mousetrap: Exploting assymetry If we want to _reliably_ kill the other process without putting ourselves at risk of being killed, we must find an assymetry that allows the 'reaper' process. Looking carefully at the `kill` man page: > For a process to have permission to send a signal, it must either be privileged (under Linux: have the CAP_KILL capability in the user namespace of the target process), or the real or effective user ID of the sending process must equal the real or saved set-user-ID of the target process. So there is an assymetry. Each process has an effective UID (`euid`), real UID (`ruid`), and saved UID (`suid`). For process A to kill process B, A's `ruid` or `euid` must match one of B's `ruid` or `suid`. Can we construct a `<euid, ruid, suid>` tuple for our "reaper" process to use which will allow it to kill the rogue process but not be killed by the rogue process? It turns out we can. If we create a new `reaper_uid`, and set its `<euid, ruid, suid>` to `<target_uid, reaper_uid, X>` (where X can be anything as long as it's not `target_uid`), then: * The reaper process can kill the target process, since its effective UID is equal to the target process's real UID * But the target process can't kill the reaper, since its real and effective UIDs are different than the real and saved UIDs of the reaper process. So the following code will safely kill all processes of `target_uid` in a race-free way: ~~~ setresuid(reaper_uid, target_uid, reaper_uid); kill(-1, 9); ~~~ Note that this `reaper_uid` must have _no other running processes_ when we call `kill`, or they will be killed as well. In practice this means either setting aside a single `reaper_uid` (and using a lock to make sure only one process calls `setresuid` at a time), or having a separate `reaper_uid` per `target_uid`. # No POSIX-compliant mousetraps? Although `setresuid` is implemented by both Linux and FreeBSD, it is not in the [current POSIX specification](http://pubs.opengroup.org/onlinepubs/9699919799/). Looking at the official list of POSIX system interfaces, it's not clear how to get a process to have the required tuple using only POSIX interfaces (namely `setuid` and `setreuid`, without recourse to `setresuid` or Linux's `CAP_SETUID`); the assumption seems to be that `euid` must always be set to either `ruid` or `suid`. Given that `RLIMIT_NOPROC` is also not in the POSIX spec, there would seem at the moment no way within that spec to safely prevent a potentially rogue process from using `fork` to evade `kill`. # Acknowledgements Thanks to Ian Jackson for [doing the analysis](http://marc.info/?i=<23226.18455.602635.161530@xxxxxxxxxxxxxxxxxxxxxxxx>) to discover the appropriate `<euid, ruid, suid>` tuple. _______________________________________________ Publicity mailing list Publicity@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/publicity
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |