[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Publicity] Technical / puzzle blog post on killing processes



George Dunlap writes ("Technical / puzzle blog post on killing processes"):
> Below is a write-up of an investigation we went into as a result of the
> QEMU depriv work.  Web searching actually found several people asking
> how this could be done, but nobody having any good answers.
> 
> I was thinking of sending this to LWN; it's the sort of quirky technical
> puzzle that their readers seem to enjoy.  Otherwise, I think we should
> post it to the Xen blog for the next person who wants to do something
> like this.
> 
> I have proof-of-concept code for most of this; I could also make a
> project on github (or gitlab) and link to it.
> 
> This is written in pandoc markdown; the proper conversion rune is
> `pandoc -s -o blog.html [filename]`.
> 
> Let me know if you have any feedback.
> 
>  -George
> 
> % Killing processes that don't want to be killed
> 
> Suppose you have a program running on your system that you don't quite
> trust.  Maybe it's a program submitted by a student to an automated
> grading system.  Or maybe it's a QEMU device model running in a Xen
> "domain 0", and you want to make sure that even if an attacker from a
> rogue VM manages to take over the QEMU process, she can't do any
> further harm.
> 
> There are many things you want to do as far as restricting its ability
> to do mischief.  But one thing in particular you probably want to do
> is to be able to reliably kill the process once you think it should be
> done.  This turns out to be quite a bit more tricky than you'd think.
> 
> # Avoiding kill with fork
> 
> So here's our puzzle.  Suppose we have a process that we've run with
> its own individual user id (`target_uid`), which we want to kill.  But
> the code in the process is currently controlled by an attacker who
> doesn't want it killed.
> 
> We obviously know the pid of the initial process we forked, so we
> could just use the `kill` system call:
> 
> ~~~
>     kill(target_pid, 9);
> ~~~
> 
> So how can an attacker avoid this?  It turns out to be pretty simple:
> 
> ~~~
>     while(1) {
>         if(!fork())
>             _exit(0);
>     }
> ~~~
> 
> This simple snippet of code will repeatedly call `fork`.  As you
> probably know, `fork` returns twice: once in the existing parent
> process, and once in a newly-created child process.  The result is
> effectively that the process races through the process ID space as
> fast as the kernel will let it.
> 
> I encourage you to run the above code snippet (preferrably in a VM),
> and see what it looks like.  It's not even very noticeable.  Running
> `top` shows a system load of about 50% (in my VM anyway), but there's
> not obviously any particular process contributing to that load;
> everything is still very responsive and functional.  If you didn't
> know about it, you might never notice it was there.
> 
> Now try killing it.  You can run `killall` to try to kill the process
> by name, but it will frequently fail with "no process killed"; and
> even when it succeeds, it often turns out that you've killed the
> _parent_ process after the `fork` but before the `exit`, so the rogue
> forker is still going strong.  Even determining whether you've managed
> to kill the process or not is a challenge.
> 
> The basic problem here is a race condition.  What `killall` does is:
> 
> 1. Read the list of processes
> 2. Call `kill(pid, sig)` on each one
> 
> In between 1 and each instance of 2, the kernel tasklist lock is
> released (since it has to return from the hypercall), giving the rogue
                                            syscall
Your background is showing :-)

> ~~~
>     setuid(target_uid);
>       kill(-1, 9);
> ~~~

Wrong indentation.

> 
> (NB that for simplicity sake I will omit error handling in these
> examples; but when playing with `kill` you should certainly make sure
> that you did switch your `uid`!)
> 
> The `kill` system call, when called with `-1`, will loop over the
> entire task list, attempting to send the signal to each process except
> the one making the system call.  The `task_list` lock is held for the
> entire loop, so the rogue process cannot complete a `fork`; and since
> the `uid`s match, it will be killed.
> 
> Done, right?  Not quite.  If we simply call `setuid`, then not only
> can we kill the rogue process, but the rogue process can also kill us:
> 
> ~~~
>     while(1) {
>         if(!fork())
>             _exit(0);
>         kill(-1, 9);
>         setpgid(0, 0);
>     }
> ~~~~
> 
> If the rogue process manages to get its own `kill(-1)` in after we've
> called `setuid` but before we've called `kill` ourselves, _we_ will be
> the ones to disappear.  So to successfully kill the rogue process, we
> still need to win a race -- something we'd rather not rely on.
> 
> # A better mousetrap: Exploting assymetry

                                  asymmetry

> 
> If we want to _reliably_ kill the other process without putting
> ourselves at risk of being killed, we must find an assymetry that
> allows the 'reaper' process.  Looking carefully at the `kill` man page:
> 
> > For a process to have permission to send a signal, it must either be
> privileged (under Linux: have the CAP_KILL capability in the user
> namespace of the target process), or the real or effective user ID of
> the sending process must equal the real or saved set-user-ID of the
> target process.
> 
> So there is an assymetry.  Each process has an effective UID (`euid`),
> real UID (`ruid`), and saved UID (`suid`).  For process A to kill
> process B, A's `ruid` or `euid` must match one of B's `ruid` or
> `suid`.  Can we construct a `<euid, ruid, suid>` tuple for our
> "reaper" process to use which will allow it to kill the rogue process
> but not be killed by the rogue process?
> 
> It turns out we can.  If we create a new `reaper_uid`, and set its `<euid,
> ruid, suid>` to `<target_uid, reaper_uid, X>` (where X can be anything
> as long as it's not `target_uid`), then:
> 
>  * The reaper process can kill the target process, since its effective
>    UID is equal to the target process's real UID
>  * But the target process can't kill the reaper, since its real and
>    effective UIDs are different than the real and saved UIDs of the
>    reaper process.
> 
> So the following code will safely kill all processes of `target_uid`
> in a race-free way:
> 
> ~~~
>     setresuid(reaper_uid, target_uid, reaper_uid);
>     kill(-1, 9);
> ~~~
> 
> Note that this `reaper_uid` must have _no other running processes_
> when we call `kill`, or they will be killed as well.  In practice this
> means either setting aside a single `reaper_uid` (and using a lock to
> make sure only one process calls `setresuid` at a time), or having a

                     reaper process runs at a time.

> # No POSIX-compliant mousetraps?
> 
> Although `setresuid` is implemented by both Linux and FreeBSD, it is
> not in the [current POSIX
> specification](http://pubs.opengroup.org/onlinepubs/9699919799/).
> Looking at the official list of POSIX system interfaces, it's not
> clear how to get a process to have the required tuple using only POSIX
> interfaces (namely `setuid` and `setreuid`, without recourse to
> `setresuid` or Linux's `CAP_SETUID`); the assumption seems to be that
> `euid` must always be set to either `ruid` or `suid`.

Proof that this can't be simulated by proper use of setuid, seteuid
and setreuid:

                                        ruid    euid    suid

The desired state is:                   reaper  target  reaper

If the final call is seteuid:

   seteuid(target);                     reaper  target  reaper

For this to be permitted, and nontrivial, euid was 0:
   
Penultimate status                      reaper  0       reaper

This state cannot be generated by setuid either euid==0 previously and
setuid would have set all of the ids; or the old euid was not 0, in
which case setuid() would have set only the euid, and required that
one of the other ids was 0, which can see that it can't have been.

This penultimate state cannot be generated by seteuid from any
different state.

So it must have been generated by setreuid.  We must avoid setreuid
setting the suid to the same as the new euid (0), which means that our
setreuid call did not change the ruid either.  That form of setreuid
is just like euid for our purposes, and not useful.

So the desired state could not be made by seteuid.

Let's consider setreuid.  Well, either setreuid sets the suid to the
same as the new euid, or it only changes the euid.  Ie, it would only
do something we could have done with seteuid and the argument above
applies.

What abouit setuid ?  Well, either setuid sets all three uids to the
same thing, or it, again, sets only the euid.

Ian.

_______________________________________________
Publicity mailing list
Publicity@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/publicity

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.