[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Publicity] Technical / puzzle blog post on killing processes


  • To: "publicity@xxxxxxxxxxxxxxxxxxxx" <publicity@xxxxxxxxxxxxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Paul Durrant <Paul.Durrant@xxxxxxxxxx>
  • From: George Dunlap <george.dunlap@xxxxxxxxxx>
  • Date: Thu, 19 Apr 2018 18:27:19 +0100
  • Autocrypt: addr=george.dunlap@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFPqG+MBEACwPYTQpHepyshcufo0dVmqxDo917iWPslB8lauFxVf4WZtGvQSsKStHJSj 92Qkxp4CH2DwudI8qpVbnWCXsZxodDWac9c3PordLwz5/XL41LevEoM3NWRm5TNgJ3ckPA+J K5OfSK04QtmwSHFP3G/SXDJpGs+oDJgASta2AOl9vPV+t3xG6xyfa2NMGn9wmEvvVMD44Z7R W3RhZPn/NEZ5gaJhIUMgTChGwwWDOX0YPY19vcy5fT4bTIxvoZsLOkLSGoZb/jHIzkAAznug Q7PPeZJ1kXpbW9EHHaUHiCD9C87dMyty0N3TmWfp0VvBCaw32yFtM9jUgB7UVneoZUMUKeHA fgIXhJ7I7JFmw3J0PjGLxCLHf2Q5JOD8jeEXpdxugqF7B/fWYYmyIgwKutiGZeoPhl9c/7RE Bf6f9Qv4AtQoJwtLw6+5pDXsTD5q/GwhPjt7ohF7aQZTMMHhZuS52/izKhDzIufl6uiqUBge 0lqG+/ViLKwCkxHDREuSUTtfjRc9/AoAt2V2HOfgKORSCjFC1eI0+8UMxlfdq2z1AAchinU0 eSkRpX2An3CPEjgGFmu2Je4a/R/Kd6nGU8AFaE8ta0oq5BSFDRYdcKchw4TSxetkG6iUtqOO ZFS7VAdF00eqFJNQpi6IUQryhnrOByw+zSobqlOPUO7XC5fjnwARAQABzSRHZW9yZ2UgVy4g RHVubGFwIDxkdW5sYXBnQHVtaWNoLmVkdT7CwYAEEwEKACoCGwMFCwkIBwMFFQoJCAsFFgID AQACHgECF4ACGQEFAlpk2IEFCQo9I54ACgkQpjY8MQWQtG1A1BAAnc0oX3+M/jyv4j/ESJTO U2JhuWUWV6NFuzU10pUmMqpgQtiVEVU2QbCvTcZS1U/S6bqAUoiWQreDMSSgGH3a3BmRNi8n HKtarJqyK81aERM2HrjYkC1ZlRYG+jS8oWzzQrCQiTwn3eFLJrHjqowTbwahoiMw/nJ+OrZO /VXLfNeaxA5GF6emwgbpshwaUtESQ/MC5hFAFmUBZKAxp9CXG2ZhTP6ROV4fwhpnHaz8z+BT NQz8YwA4gkmFJbDUA9I0Cm9D/EZscrCGMeaVvcyldbMhWS+aH8nbqv6brhgbJEQS22eKCZDD J/ng5ea25QnS0fqu3bMrH39tDqeh7rVnt8Yu/YgOwc3XmgzmAhIDyzSinYEWJ1FkOVpIbGl9 uR6seRsfJmUK84KCScjkBhMKTOixWgNEQ/zTcLUsfTh6KQdLTn083Q5aFxWOIal2hiy9UyqR VQydowXy4Xx58rqvZjuYzdGDdAUlZ+D2O3Jp28ez5SikA/ZaaoGI9S1VWvQsQdzNfD2D+xfL qfd9yv7gko9eTJzv5zFr2MedtRb/nCrMTnvLkwNX4abB5+19JGneeRU4jy7yDYAhUXcI/waS /hHioT9MOjMh+DoLCgeZJYaOcgQdORY/IclLiLq4yFnG+4Ocft8igp79dbYYHkAkmC9te/2x Kq9nEd0Hg288EO/OwE0EVFq6vQEIAO2idItaUEplEemV2Q9mBA8YmtgckdLmaE0uzdDWL9To 1PL+qdNe7tBXKOfkKI7v32fe0nB4aecRlQJOZMWQRQ0+KLyXdJyHkq9221sHzcxsdcGs7X3c 17ep9zASq+wIYqAdZvr7pN9a3nVHZ4W7bzezuNDAvn4EpOf/o0RsWNyDlT6KECs1DuzOdRqD oOMJfYmtx9hMzqBoTdr6U20/KgnC/dmWWcJAUZXaAFp+3NYRCkk7k939VaUpoY519CeLrymd Vdke66KCiWBQXMkgtMGvGk5gLQLy4H3KXvpXoDrYKgysy7jeOccxI8owoiOdtbfM8TTDyWPR Ygjzb9LApA8AEQEAAcLBZQQYAQoADwUCVFq6vQIbDAUJAeEzgAAKCRCmNjwxBZC0bWknD/97 Tkh3PMAcvMZINmJefBdYYspmwTWZSR9USsy68oWzDsXKNDNTqBC781lR/7PSqhqaSOmSnty3 FNblaBYKfMV3OOWgrP0H8Voqp4IgH3yOOkQLVITIwulqbbxQtmCsJ3xkhZm6CA0EKbc9VM/j FX3aCAfOJf52vlY1gXjYOvVjrdrRrBXEjs8E5f6EsrQKDrWCKNx/9qRfmtsQeKHTsgpINkpZ s11ClX/sM/RCR9/BgB/K08QQZYsWD6lgZh1KxLXRzKRunba0L+jpcRsoQFUMj/ofrfnHAdl0 q2upzISM/wR8aer+kekMo+y00schmYJYu5JAAzbjQQuhCAg0UTBGPaNwteL2l3c9Ps8on1nl mq9TnbYwGLAxJzXSb3BATgz7dygpsBBNS5WhUNQgIJvcZJbLggEIqjZGs8o7/+dt4klwxCYL FVlsWYSwEjX0UYHVLMS/F7FcXbCMUeoN/4krmRyv7YICE/VDQSDPcSKedzWvQM8T+5uY5pFJ NiIaa6asFndP50GiKbFtD6xAM+rbnwT7Io+iPtvD/3ddMXQs58IVMzgNA/hcdOX/qlx6Jqk/ hYQQsl4HoQsx/GyrNiwiPErTx32QNeXxoGYm6kwxt7F5qK7AN5tyYNkEyoxYrv8bl9VjAve8 hpECyf4O1mOGC/dIuBCDk8gxL5Pbo3jl98LBZQQYAQoADwIbDAUCVlNqsQUJA9njdAAKCRCm NjwxBZC0bbJMEACigmtpL2lzS47DXydApr1X8SYCHIPc39OjvmErjP05lKUZjmesmhlM5eKO gPb/fzeJ0wXB4J8OyseIJ0D/XwyLLQeM8d/HUFFMBWr+HE7jIukAUXeQ6GRwR+MBYGK/KmR9 JHbMAUz8f3G087Ma12BfpNWayndlFwR3rvdV4lvlyx6cl0EaFhbzPu/N07HG5MTk0evtphgZ 7wuG1oAtO+DGA6orHEicor6nBAQNZzPyjqo40dBxTs+amx7UndMRPSL1dD57eJwbbvBeNa8I w8wT7oNy2/C21VWmSy5XzMzcUTgmjmQz6DSNJPz2dMK4Y/LtcVFTfSZTmlBIkfoc9Vay2EB9 3z2EmjZwGT7n/DRu9QDtLbXyeVTBuLTaP3D+q5AyR1/5Z4T0LhwNvxeND5yO+YNAwqocZwL+ OcctpSZUBpAuU4Ju/9JKMX57GlnbjB8YGahoBJsQZx4CZyw0MXlkCk5cR0EPjY9iI2CEA5lO QueOSbo0hf1ZJwCx724lx0WSwL8ngd8wZTYMNc8GngaU61kmzfcuCklhokTxQdK7Efme5ccv A1txzgGewx9mDhPgNcJweasBnyL0N3wya2RMAzm04gCio8y4FKQepwQpKCNKAYZIU4juAPxn nb6cbBGiMGO1NDuxG+qvl1cMElnq+cuhSUlZdr2sE9JRfa0gucLBZQQYAQoADwIbDAUCWHQN VAUJBfqGFwAKCRCmNjwxBZC0bbgCD/oC6mWUrxQKWPDvFE9+fzm8UKqKP7aciz+gvWUN3o4i 4sRFNyvAEOW/QY2zwM1pN07BFZ3Z+8AVxpgR6h7RQzDJYSPZ5k5WWCJzJEQs2sPI5rfYJGK8 um7mlsSvf2xcLK/1Aj07BmWDjR6glDDRY+iMmSSdHe6Te6tiQPPS6Woj8AE3qf5lBsdvcEln nrkSwzNeVKRQQROUOskVw4WmCsNJjZtKmrVpgId3df/5HWG7Bi4nPwA8IFOt6O72lJlkORFy DF5P7ML7Pc5LbEFimzETPBxTJzVu1UoOQb/THB+qxhKMXXudSf/5sdMhwvOwItIcc5pib/v6 7gWK48bAzoOTgNYzmDCVC/roeLLU2SpEQIlIR0eAaWImgt8VEtre3Gch33e41DtbUli54DX0 dRdhqQaDM1T1q77VyDoZcs+SpGX9Ic9mxl+BN+6vtGIUVgaOG5pF85aQlRfCD6IlFQgiZtiR XeRpeIYG27RUw5kIljW+VxPMdBUvZpUXEazqjoPvBKybg0oKFfMXrMj4vHo6J0FD3ZEToGnP dANspUCZRewRozjp7ZWIu7QfGasfJNQ8c1IDiAFl3rV+dAGXXdmrDcX6w2q5lqoFz+8npK2I ehKCA94U+J/RLywUiaLuHnXt40WvQ98kHm7uTsy36iWqqawPqzmn8m5ruynVHmmcXsLBZQQY AQoADwIbDAUCWmTXMwUJB+tP9gAKCRCmNjwxBZC0bb+2D/9hjn1k5WcRHlu19WGuH6q0Kgm1 LRT7PnnSz904igHNElMB5a7wRjw5kdNwU3sRm2nnmHeOJH8kYj2Hn1QgX5SqQsysWTHWOEse GeoXydx9zZZkt3oQJM+9NV1VjK0bOXwqhiQyEUWz5/9l467FS/k4FJ5CHNRumvhLa0l2HEEu 5pxq463HQZHDt4YE/9Y74eXOnYCB4nrYxQD/GSXEZvWryEWreDoaFqzq1TKtzHhFgQG7yFUE epxLRUUtYsEpT6Rks2l4LCqG3hVD0URFIiTyuxJx3VC2Ta4LH3hxQtiaIpuXqq2D4z63h6vC x2wxfZc/WRHGbr4NAlB81l35Q/UHyMocVuYLj0llF0rwU4AjiKZ5qWNSEdvEpL43fTvZYxQh DCjQTKbb38omu5P4kOf1HT7s+kmQKRtiLBlqHzK17D4K/180ADw7a3gnmr5RumcZP3NGSSZA 6jP5vNqQpNu4gqrPFWNQKQcW8HBiYFgq6SoLQQWbRxJDHvTRYJ2ms7oCe870gh4D1wFFqTLe yXiVqjddENGNaP8ZlCDw6EU82N8Bn5LXKjR1GWo2UK3CjrkHpTt3YYZvrhS2MO2EYEcWjyu6 LALF/lS6z6LKeQZ+t9AdQUcILlrx9IxqXv6GvAoBLJY1jjGBq+/kRPrWXpoaQn7FXWGfMqU+ NkY9enyrlw==
  • Delivery-date: Thu, 19 Apr 2018 17:27:28 +0000
  • List-id: "List for Xen Publicity, PR and events" <publicity.lists.xenproject.org>
  • Openpgp: preference=signencrypt

Below is a write-up of an investigation we went into as a result of the
QEMU depriv work.  Web searching actually found several people asking
how this could be done, but nobody having any good answers.

I was thinking of sending this to LWN; it's the sort of quirky technical
puzzle that their readers seem to enjoy.  Otherwise, I think we should
post it to the Xen blog for the next person who wants to do something
like this.

I have proof-of-concept code for most of this; I could also make a
project on github (or gitlab) and link to it.

This is written in pandoc markdown; the proper conversion rune is
`pandoc -s -o blog.html [filename]`.

Let me know if you have any feedback.

 -George

% Killing processes that don't want to be killed

Suppose you have a program running on your system that you don't quite
trust.  Maybe it's a program submitted by a student to an automated
grading system.  Or maybe it's a QEMU device model running in a Xen
"domain 0", and you want to make sure that even if an attacker from a
rogue VM manages to take over the QEMU process, she can't do any
further harm.

There are many things you want to do as far as restricting its ability
to do mischief.  But one thing in particular you probably want to do
is to be able to reliably kill the process once you think it should be
done.  This turns out to be quite a bit more tricky than you'd think.

# Avoiding kill with fork

So here's our puzzle.  Suppose we have a process that we've run with
its own individual user id (`target_uid`), which we want to kill.  But
the code in the process is currently controlled by an attacker who
doesn't want it killed.

We obviously know the pid of the initial process we forked, so we
could just use the `kill` system call:

~~~
    kill(target_pid, 9);
~~~

So how can an attacker avoid this?  It turns out to be pretty simple:

~~~
    while(1) {
        if(!fork())
            _exit(0);
    }
~~~

This simple snippet of code will repeatedly call `fork`.  As you
probably know, `fork` returns twice: once in the existing parent
process, and once in a newly-created child process.  The result is
effectively that the process races through the process ID space as
fast as the kernel will let it.

I encourage you to run the above code snippet (preferrably in a VM),
and see what it looks like.  It's not even very noticeable.  Running
`top` shows a system load of about 50% (in my VM anyway), but there's
not obviously any particular process contributing to that load;
everything is still very responsive and functional.  If you didn't
know about it, you might never notice it was there.

Now try killing it.  You can run `killall` to try to kill the process
by name, but it will frequently fail with "no process killed"; and
even when it succeeds, it often turns out that you've killed the
_parent_ process after the `fork` but before the `exit`, so the rogue
forker is still going strong.  Even determining whether you've managed
to kill the process or not is a challenge.

The basic problem here is a race condition.  What `killall` does is:

1. Read the list of processes
2. Call `kill(pid, sig)` on each one

In between 1 and each instance of 2, the kernel tasklist lock is
released (since it has to return from the hypercall), giving the rogue
process a chance to fork.  Indeed, it has many chances; since the
second one takes a non-negligible amount of time, by the time you
manage to find the rogue process, it's likely already forked, and
perhaps even exited.

It's true, if we ran `killall` 1000 times, it would very likely end up
dead; and if we ran `ps` 1000 times, and found no trace of the
process, we might be pretty sure that it was gone.  On the other hand,
that assumes that the "race" is fair, and that the attacker hasn't
discovered some way of making sure that the race ends up going her
way.  It would be best if we didn't rely on these sorts of
probabilistic calculations to clean things up.

# A better mousetrap: Preventing forks?

One thing to do, of course, would be to try to prevent the process
from executing `fork` in the first place.  This could be done on Linux
using the `seccomp2` call; but it's Linux-specific.  (Xen, in
particular, wants to be able to support NetBSD and FreeBSD dom0's, so
we can't rely on this for correctness.)  Another would be to use the
`rlimit` system call and set `RLIMIT_NPROC` to `0`.  This should, in
theory, prevent this process from calling `fork` (since by definition
there would already be one process with its UID running).

But even `RLIMIT_NPROC` has had [issues in the
past](https://lwn.net/Articles/451985/).  Surely there must be a way
to kill a process in a way that it can't evade, without relying on
being able to take away `fork`?

# A better mousetrap: Process groups?

Looking more  closely at the `kill` man page,  it turns out  that the
`pid` argument can be interpreted in four possible ways:

* `pid` > 0: `pid` of a single process to kill
* `pid` < -1:  `pgid` of a _process group_ to kill
* `pid` == 0: Kill every process in my current process group
* `pid` == -1: Kill every process that I'm allowed to kill

At first glance it seems like killing by `pgid` might do what we want.
To run our untrusted process, set the PGID and the UID, and to kill
it, we call `kill(-target_pgid, 9)`.

Unfortunately, unlike the user id, unprivileged processes are
explicitly allowed to change their `pgid`.  So our attacker could
simply run something like the following to avoid being killed in the
same way:

~~~
    while(1) {
        if(!fork())
            _exit(0);
        setpgid(0, 0);
    }
~~~

In this case, the child process changes its PGID to match its PID as
soon as it forks, making `kill(-target_pgid)` as racy as
`kill(target_pid)`.

# A better mousetrap: kill -1

Ok, what about the last one -- "kill every process I'm allowed to
kill"?  Well we obviously don't want to run that as root unless we
want to nuke the entire system; we want to limit "all processes I'm
allowed to kill" to the particular uid we've given to the rogue
process.

Well in general processes are allowed to kill other processes with
their own uid; so what about something like the following?

~~~
    setuid(target_uid);
        kill(-1, 9);
~~~

(NB that for simplicity sake I will omit error handling in these
examples; but when playing with `kill` you should certainly make sure
that you did switch your `uid`!)

The `kill` system call, when called with `-1`, will loop over the
entire task list, attempting to send the signal to each process except
the one making the system call.  The `task_list` lock is held for the
entire loop, so the rogue process cannot complete a `fork`; and since
the `uid`s match, it will be killed.

Done, right?  Not quite.  If we simply call `setuid`, then not only
can we kill the rogue process, but the rogue process can also kill us:

~~~
    while(1) {
        if(!fork())
            _exit(0);
        kill(-1, 9);
        setpgid(0, 0);
    }
~~~~

If the rogue process manages to get its own `kill(-1)` in after we've
called `setuid` but before we've called `kill` ourselves, _we_ will be
the ones to disappear.  So to successfully kill the rogue process, we
still need to win a race -- something we'd rather not rely on.

# A better mousetrap: Exploting assymetry

If we want to _reliably_ kill the other process without putting
ourselves at risk of being killed, we must find an assymetry that
allows the 'reaper' process.  Looking carefully at the `kill` man page:

> For a process to have permission to send a signal, it must either be
privileged (under Linux: have the CAP_KILL capability in the user
namespace of the target process), or the real or effective user ID of
the sending process must equal the real or saved set-user-ID of the
target process.

So there is an assymetry.  Each process has an effective UID (`euid`),
real UID (`ruid`), and saved UID (`suid`).  For process A to kill
process B, A's `ruid` or `euid` must match one of B's `ruid` or
`suid`.  Can we construct a `<euid, ruid, suid>` tuple for our
"reaper" process to use which will allow it to kill the rogue process
but not be killed by the rogue process?

It turns out we can.  If we create a new `reaper_uid`, and set its `<euid,
ruid, suid>` to `<target_uid, reaper_uid, X>` (where X can be anything
as long as it's not `target_uid`), then:

 * The reaper process can kill the target process, since its effective
   UID is equal to the target process's real UID
 * But the target process can't kill the reaper, since its real and
   effective UIDs are different than the real and saved UIDs of the
   reaper process.

So the following code will safely kill all processes of `target_uid`
in a race-free way:

~~~
    setresuid(reaper_uid, target_uid, reaper_uid);
    kill(-1, 9);
~~~

Note that this `reaper_uid` must have _no other running processes_
when we call `kill`, or they will be killed as well.  In practice this
means either setting aside a single `reaper_uid` (and using a lock to
make sure only one process calls `setresuid` at a time), or having a
separate `reaper_uid` per `target_uid`.

# No POSIX-compliant mousetraps?

Although `setresuid` is implemented by both Linux and FreeBSD, it is
not in the [current POSIX
specification](http://pubs.opengroup.org/onlinepubs/9699919799/).
Looking at the official list of POSIX system interfaces, it's not
clear how to get a process to have the required tuple using only POSIX
interfaces (namely `setuid` and `setreuid`, without recourse to
`setresuid` or Linux's `CAP_SETUID`); the assumption seems to be that
`euid` must always be set to either `ruid` or `suid`.

Given that `RLIMIT_NOPROC` is also not in the POSIX spec, there would
seem at the moment no way within that spec to safely prevent a
potentially rogue process from using `fork` to evade `kill`.

# Acknowledgements

Thanks to Ian Jackson for [doing the
analysis](http://marc.info/?i=<23226.18455.602635.161530@xxxxxxxxxxxxxxxxxxxxxxxx>)
to discover the appropriate `<euid, ruid, suid>` tuple.

_______________________________________________
Publicity mailing list
Publicity@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/publicity

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.