Xen project Mailing List

[Xen-devel] Device model operation hypercall (DMOP, re qemu depriv)

Introducing HVMCTL, Jan wrote: > A long while back separating out all control kind operations (intended > for use by only the control domain or device model) from the currect > hvmop hypercall has been discussed. This series aims at finally making > this reality (at once allowing to streamline the associated XSM checking). I think we need to introduce a new hypercall (which I will call DMOP for now) which may augment or replace some of HVMCTL. Let me explain: We would like to be able to deprivilege qemu-in-dom0. This is because qemu has a large attack surface and has a history of security bugs. If we get this right we can easily reduce the impact of `guest can take over qemu' bugs to DoS; and perhaps with a bit of effort we can eliminate the DoS too. (qemu stubdom are another way to do this but they have their own difficulties.) A part of this plan has to be a way for qemu to make hypercalls related to the guest it is servicing. But qemu needs to be _unable_ to make _other_ hypercalls. I see four possible approaches. In IMO increasing order of desirability: 1. We could simply patch the dom0 privcmd driver to know exactly which hypercalls are permitted. This is obviously never going to work because there would have to be a massive table in the kernel, kept in step with Xen. We could have a kind of pattern matching engine instead, and load the tables from userspace, but that's a daft edifice to be building (even if we reuse BPF or something) and a total pain to maintain. 2. We could have some kind of privileged proxy or helper process, which makes the hypercalls on instruction from qemu. This would be quite complicated and involve a lot of back-and-forth parameter passing. Like option 1, this arrangement would end up embedding detailed knowledge about which hypercalls are appropriate, and have to understand all of their parameters. 3. We could have the dom0 privcmd driver wrap each of qemu's hypercalls in a special "wrap up with different XSM tag" hypercall. Then, we could specify the set of allowable hypercalls with XSM. If we want qemu deprivileged by default, this depends on turning XSM on by default. But we want qemu depriv ASAP and there are difficulties with XSM by default. This approach also involves writing a large and hard-to-verify hypercall permission table, in the form of an XSM policy. 4. We could invent a new hypercall `DMOP' for hypercalls which device models should be able to use, which always has the target domain in a fixed location in the arguments. We have the dom0 privcmd driver know about this one hypercall number and the location of the target domid. Option 4 has the following advantages: * The specification of which hypercalls are authorised to qemu is integrated with the specification of the hypercalls themselves: There is no need to maintain a separate table which can get out of step (or contain security bugs). * The changes required to the rest of the system are fairly small. In particular: * We need only one small, non-varying, patch to the dom0 kernel. Let me flesh out option 4 in more detail: We define a new hypercall DMOP. Its first argument is always a target domid. The DMOP hypercall number and position of the target domid in the arguments are fixed. A DMOP is defined to never put at risk the stability or security of the whole system, nor of the domain which calls DMOP. However, a DMOP may have arbitrary effects on the target domid. In the privcmd driver, we provide a new restriction ioctl, which takes a domid parameter. After that restriction ioctl is called, the privcmd driver will permit only DMOP hypercalls, and only with the specified target domid. Since the hypercall number and the target domid are stable, this is a simple check which will not need to be updated as new DMOPs are defined (and old ones retired). DMOPs are not available to guests (other than stub device model domains) and do not form part of the guest-stable ABI. Where the set of operations provided through DMOPs overlaps with guest-stable hypercalls, identical functionality must provided through both parts of the hypercall namespace. Privileged toolstack software is permitted to use DMOPs as well as other hypercalls, of course. So there is no need to duplicate functionality between DMOPs and non-stable privileged toolstack hypercalls. On ABI/API stability: For this scheme to work, it is not essential that the DMOPs themselves should have a stable ABI. However, we do want to be able to decouple qemu versions from Xen versions. This could be done by having the relevant bit of libxc (let us suppose libdevicemodel) be capable of driving multiple versions of Xen. Or by having different libdevicemodel versions, one for each version of Xen, and some kind of ad-hoc select-the-right-library arrangement to cope with dual booting. Alternatively, old DMOP interfaces (ie, old DMOPs) could simply be retained for a few Xen releases and then retired, providing a semi-stable ABI to device model software. In any case, probably the DMOP opcode needs to be a wide field so that when new DMOPs, or new versions of old DMOPs, arise, we can assign them new numbers. (Alternatively we could have a version field in every DMOP which is checked for equality, but that makes some compatibility strategies more painful.) What do people think ? Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.