[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Device model operation hypercall (DMOP, re qemu depriv)



Introducing HVMCTL, Jan wrote:
> A long while back separating out all control kind operations (intended
> for use by only the control domain or device model) from the currect
> hvmop hypercall has been discussed. This series aims at finally making
> this reality (at once allowing to streamline the associated XSM checking).

I think we need to introduce a new hypercall (which I will call DMOP
for now) which may augment or replace some of HVMCTL.  Let me explain:


We would like to be able to deprivilege qemu-in-dom0.  This is
because qemu has a large attack surface and has a history of security
bugs.  If we get this right we can easily reduce the impact of `guest
can take over qemu' bugs to DoS; and perhaps with a bit of effort we
can eliminate the DoS too.  (qemu stubdom are another way to do this
but they have their own difficulties.)

A part of this plan has to be a way for qemu to make hypercalls
related to the guest it is servicing.  But qemu needs to be _unable_
to make _other_ hypercalls.

I see four possible approaches.  In IMO increasing order of
desirability:

1. We could simply patch the dom0 privcmd driver to know exactly which
   hypercalls are permitted.  This is obviously never going to work
   because there would have to be a massive table in the kernel, kept
   in step with Xen.  We could have a kind of pattern matching engine
   instead, and load the tables from userspace, but that's a daft
   edifice to be building (even if we reuse BPF or something) and a
   total pain to maintain.

2. We could have some kind of privileged proxy or helper process,
   which makes the hypercalls on instruction from qemu.  This would be
   quite complicated and involve a lot of back-and-forth parameter
   passing.  Like option 1, this arrangement would end up embedding
   detailed knowledge about which hypercalls are appropriate, and have
   to understand all of their parameters.

3. We could have the dom0 privcmd driver wrap each of qemu's
   hypercalls in a special "wrap up with different XSM tag" hypercall.
   Then, we could specify the set of allowable hypercalls with XSM.
   If we want qemu deprivileged by default, this depends on turning
   XSM on by default.  But we want qemu depriv ASAP and there are
   difficulties with XSM by default.  This approach also involves
   writing a large and hard-to-verify hypercall permission table, in
   the form of an XSM policy.

4. We could invent a new hypercall `DMOP' for hypercalls which device
   models should be able to use, which always has the target domain in
   a fixed location in the arguments.  We have the dom0 privcmd driver
   know about this one hypercall number and the location of the target
   domid.

Option 4 has the following advantages:

* The specification of which hypercalls are authorised to qemu is
  integrated with the specification of the hypercalls themselves:
  There is no need to maintain a separate table which can get out of
  step (or contain security bugs).

* The changes required to the rest of the system are fairly small.
  In particular:

* We need only one small, non-varying, patch to the dom0 kernel.


Let me flesh out option 4 in more detail:


We define a new hypercall DMOP.

Its first argument is always a target domid.  The DMOP hypercall
number and position of the target domid in the arguments are fixed.

A DMOP is defined to never put at risk the stability or security of
the whole system, nor of the domain which calls DMOP.  However, a DMOP
may have arbitrary effects on the target domid.

In the privcmd driver, we provide a new restriction ioctl, which takes
a domid parameter.  After that restriction ioctl is called, the
privcmd driver will permit only DMOP hypercalls, and only with the
specified target domid.

Since the hypercall number and the target domid are stable, this is a
simple check which will not need to be updated as new DMOPs are
defined (and old ones retired).

DMOPs are not available to guests (other than stub device model
domains) and do not form part of the guest-stable ABI.  Where the set
of operations provided through DMOPs overlaps with guest-stable
hypercalls, identical functionality must provided through both
parts of the hypercall namespace.

Privileged toolstack software is permitted to use DMOPs as well as
other hypercalls, of course.  So there is no need to duplicate
functionality between DMOPs and non-stable privileged toolstack
hypercalls.


On ABI/API stability:

For this scheme to work, it is not essential that the DMOPs themselves
should have a stable ABI.

However, we do want to be able to decouple qemu versions from Xen
versions.  This could be done by having the relevant bit of libxc (let
us suppose libdevicemodel) be capable of driving multiple versions of
Xen.  Or by having different libdevicemodel versions, one for each
version of Xen, and some kind of ad-hoc select-the-right-library
arrangement to cope with dual booting.

Alternatively, old DMOP interfaces (ie, old DMOPs) could simply be
retained for a few Xen releases and then retired, providing a
semi-stable ABI to device model software.

In any case, probably the DMOP opcode needs to be a wide field so that
when new DMOPs, or new versions of old DMOPs, arise, we can assign
them new numbers.  (Alternatively we could have a version field in
every DMOP which is checked for equality, but that makes some
compatibility strategies more painful.)


What do people think ?

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.