Xen project Mailing List

Re: [Xen-devel] [PATCH] docs/qemu-deprivilege: Revise and update with status and future plans

George Dunlap writes ("Re: [PATCH] docs/qemu-deprivilege: Revise and update with status and future plans"): > Actually I think most of the user-facing stuff already in xl.cfg is > inappropriate for that man page. It might make sense to have a separate > man page for it. I wouldn't object to that. > On 03/26/2018 05:43 PM, Ian Jackson wrote: > > No. Firstly, in each case, all relevant descriptors are restricted. > > This is the purpose of the xentoolcore__restrict_* stuff. Secondly, > > xenstore *is* covered - but the xs fd is squashed so as to be totally > > unuseable: xs.c uses xentoolcore__restrict_by_dup2_null. > > Ross already gave me some corrections on this; here is what I have: > > 8<--- > '''Description''': Close and restrict Xen-related file descriptors. > Specifically: > * Close all xenstore-related file descriptors > * Make sure that extraneous `privcmd` and `evtchn` instances are > closed > * Make sure that all open instances of `privcmd` and `evtchn` file > descriptors have had IOCTL_PRIVCMD_RESTRICT and > IOCTL_EVTCHN_RESTRICT_DOMID ioctls called on them, respectively. > --->8 > > It sounds like the last may be inaccurate for libxl? I don't think anything closes any extraneous fds. My approach in the libxc layer was to register all fds and have the restrict call iterate over all of them. So, I guess, drop your 2nd bullet. All of this is done by qemu calling xentoolcore_restrict_all, not by libxl. Maybe the "extraneous privcmd and evtchn fds" Ross means are ones inherited by qemu from the toolstack. Hrm. Now I want to try to make an argument that no such fds are leaked into qemu subprocesses by libxl but I don't think it's trivial to do so. It may not be true even in the usual case but I also worry about races in higher layers that call libxl in multiple threads with multiple ctx's concurrently and which might therefore end up forking for a qemu while another thread is opening privcmd. Sorting this out needs to go on your todo list :-/. > >> +### Namespaces for unused functionality ... > >> + unshare(CLONE_NEWNS | CLONE_NEWIPC) > > > > This would mean we would have to pass qemu fds for both the network > > tap devices and any vnc consoles. That makes life considerably more > > complicated. I think we should perhaps revisit this upstream. > > You haven't read this very carefully (nor do you seem to have read the > discussion with Ross before responding). I split the "Namespaces" > restriction Ross suggested internally to us into two parts: > - Namespace restrictions for unused functionality > - Network namespacing Oh, sorry. Yes. > In fact, I was wondering if it might make sense to do *all* the > deprivileging in a separate process before starting qemu. That would > make it simple, for example, to write a test program which would try to > break out of the "jail" we'd put it in. I think this would be a lot of trouble because we'd have to pass pre-restricted privcmd fds into qemu and stuff them into libxc et al. It would mean fixing a lot of redundant handle opening which is currently harmless. > >> +Other things that would take some cleverness / changes to QEMU to > >> +utilize due to ordering constrants: > >> + - RLIMIT_NPROC (after uid changes to a unique uid) > >> + - RLIMIT_NOFILES (after all necessary files are opened) > > > > I think there is little difficulty with RLIMIT_NPROC since our qemu > > does not fork. I think we can set it to a value which is currently > > violated for the current uid ? > > Well AFAICT classic POSIX allows you to set rlimits on yourself, but not > on another process. Since this is on the *user id* rather than the > *process*, I didn't think "setrlimit [as root] -> exec -> setuid" would > work correctly; I assumed you'd have to have "exec -> setuid -> > setrlimit", which would require further changes to QEMU. rlimits are a process property. I'm quite alarmed by the patch by Neil Brown in your first link that setuid should fail if it would cause RLIMIT_NPROC to be violated for this process in the context of the new uid ! The difficulties discussed on your 2nd link were immediately obvious to me on reading the first one. AFAICT from the article Linux now does not check RLIMIT_NPROC during setuid, nor during execve. I think that is correct behaviour. > I now realize that it might be that the limit will follow the current > uid of the process, in which case "setrlimit -> setuid" might have the > expected behavior. But a quick Google search shows that the interaction > of RLIMIT_NPROC and setuid() is tricky[1][2], and may vary from > implementation to implementation; relying on the interaction to be > correct (and stay correct) seems somewhat risky (unless POSIX has > explicitly documented what should happen in that case, which again a > quick Google search hasn't turned up). RLIMIT_NPROC is not in SuS: http://pubs.opengroup.org/onlinepubs/9699919799/functions/setrlimit.html FreeBSD does not have any of this madness: https://www.unix.com/man-page/freebsd/2/setuid/ Anyway IMO we should set RLIMIT_NPROC after fork and before execve. If this causes setuid to fail in qemu, qemu will crash. But this could only be the case if the new uid already has other processes, which it shouldn't do. (If it does, the new uid is probably contaminated by a previous domain with the same domid.) If the kernel is a version of Linux with the execve RLIMIT_NPROC check, then execve will fail because there will be many more processes than root's RLIMIT_NPROC. I propose to treat that as a bug in Linux. > Linux does seem to have a "set rlimit on another process" system call > (prlimit). But that would still require at least a little bit of care, > as then we'd need to set the limit after the setuid but before the guest > started running. And in any case I couldn't (again in a quick search) > discover that FreeBSD has such a system call (and working correctly on > FreeBSD seems to be a design goal). Yes. > >> +### libxl UID cleanup > > ... > >> +kill(-1,sig) sends a signal to "every process to which the calling > >> +process has permission to send a signal". So in theory: > >> + setuid(X) > >> + kill(-1,KILL) > >> +should do the trick. > > > > We need to check whether a malicious qemu process could kill this > > one. > > Hmm, in theory it probably could. I could find nothing in SuS explaining when process A may send signals to process B. So I resorted to the BSD manpages: https://www.unix.com/man-page/freebsd/2/kill/ For a process to have permission to send a signal to a process designated by pid, the user must be the super-user, or the real or saved user ID of the receiving process must match the real or effective user ID of the sending process. Also kill(-1,) does not signal the sender. I did some web searches hoping to find someone had already solved this. The best discussion is here https://www.unix.com/unix-for-advanced-and-expert-users/168364-kill-all-process-uid.html but the proposed answer doesn't work because the kill would kill lots of root processes. My analysis: euid ruid suid compared for receiving process ==== ==== compared for sending process ==== ==== possible rogue processes qid qid qid unrelated rootish processes 0 any any or any 0 any or any any 0 killer (our pet issuing kill) so rogues can't kill killer !=qid !=qid so it doesn't kill randomly !=0 !=0 therefore: qid pet 0 We will have to reserve one uid `pet' specially for this nonsense. We call setresuid(2) to switch uids, and then kill(-1,9) and _exit(0). The kill() call will kill other processes with uid `pet'. So we take a global lock while we do this. `pet' could be a uid associated with a reserved domid, eg dom0. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.