[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] docs/qemu-deprivilege: Revise and update with status and future plans



Thanks for this update!

George Dunlap writes ("[PATCH] docs/qemu-deprivilege: Revise and update with 
status and future plans"):
...
> +# Technical details
> +
> +## Restrictions done

This makes this doc into a mixture of a design doc and a user doc, I
think.

It might be worth stating the design intent, which I think is this:

 * Even if there is a bug (for example in qemu) which permits a domain
   to compromise the device model, the compromised device model
   process is prevented from violating the system's overall security
   properties.  Ie, a guest cannot "escape" from the virtualisation by
   using a qemu bug.

This design intent is not yet achieved.  Right now an attacker is
impeded and their attack is complicated; in some circumstances the
will be limited to denial of service.

I'm not sure the individual restrictions need to be in a user-facing
doc.

Maybe the user-facing wording from your patch should be moved to
xl.cfg.doc.5 ?

> +'''Description''': Close and restrict Xen-related file descriptors.
> +Specifically, make sure that only one `privcmd` instance is open, and
> +that the IOCTL_EVTCHN_RESTRICT_DOMID ioctl has been called.
> +
> +XXX Also, make sure that only one `xenstore` fd remains open, and that
> +it's restricted.

No.  Firstly, in each case, all relevant descriptors are restricted.
This is the purpose of the xentoolcore__restrict_* stuff.  Secondly,
xenstore *is* covered - but the xs fd is squashed so as to be totally
unuseable: xs.c uses xentoolcore__restrict_by_dup2_null.

> +### Namespaces for unused functionality
> +
> +'''Descripiton''': Enter QEMU into its own mount & IPC namespaces.
> +This means that even if other restrictions fail, the process won't be
> +able to even name system mount points or exsting non-file-based IPC
> +descriptors to attempt to attack them.
> +
> +'''Implementation''':
> +
> +In theory this could be done in QEMU (similar to -sandbox, -runas,
> +-chroot, and so on), but a patch doing this in QEMU was NAKed
> +upstream. They preferred that this was done as a setup step by
> +whatever executes QEMU; i.e., have the process which exec's QEMU first
> +call:
> +
> +    unshare(CLONE_NEWNS | CLONE_NEWIPC)

This would mean we would have to pass qemu fds for both the network
tap devices and any vnc consoles.  That makes life considerably more
complicated.  I think we should perhaps revisit this upstream.

> +'''Implementation''': Enable from the command-line:
> +
> +    -sandbox 
> on,obsolete=deny,elevateprivileges=allow,spawn=deny,resourcecontrol=deny
> +
> +`elevateprivileges` is currently required to allow `-runas` to work.
> +Removing this requirement would mean making sure that the uid change
> +happened before the seccomp2 call, perhaps by changing the uid before
> +executing QEMU.  (But this would then require other changes to create
> +the QMP socket, VNC socket, and so on).

See what I say above.

> +### Further RLIMITs
> +
> +RLIMIT_AS limits the total amount of memory; but this includes the
> +virtual memory which QEMU uses as a mapcache.  xen-mapcache.c already
> +fiddles with this; it would be straightforward to make it *set* the
> +rlimit to what it thinks a sensible limit is.
> +
> +Other things that would take some cleverness / changes to QEMU to
> +utilize due to ordering constrants:
> + - RLIMIT_NPROC (after uid changes to a unique uid)
> + - RLIMIT_NOFILES (after all necessary files are opened)

I think there is little difficulty with RLIMIT_NPROC since our qemu
does not fork.  I think we can set it to a value which is currently
violated for the current uid ?

> +### libxl UID cleanup
...
> +kill(-1,sig) sends a signal to "every process to which the calling
> +process has permission to send a signal".  So in theory:
> +  setuid(X)
> +  kill(-1,KILL)
> +should do the trick.

We need to check whether a malicious qemu process could kill this
one.

> +### Disks
> +
> +The chroot (and seccomp?) happens late enough such that QEMU can
> +initialize itself and open its disks. If you want to add a disk at run
> +time via or insert a CD, you can't pass a path because QEMU is
> +chrooted. Instead use the add-fd QMP command and use
> +/dev/fdset/<fdset-id> as the path.

I don't think we (Xen) really support hotplug of emulated disks right
now.  So it's just cd insert that's a problem.

> +### Network
>  
> +If QEMU runs in its own network namespace, it can't open the tap
> +device itself because the interface won't be visible outside of its
> +own namespace. So instead, have the toolstack open the device and pass
> +it as an fd on the command-line:

I think this could be solved by doing these things in a different
order.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.