Xen project Mailing List

Re: [Xen-devel] [RFC] Dom0 PV IOMMU control design (draft A)

From: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>

Date: Mon, 14 Apr 2014 17:55:27 +0100

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 14 Apr 2014 16:55:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 14/04/14 17:00, Jan Beulich wrote: >>>> On 14.04.14 at 17:48, <malcolm.crossley@xxxxxxxxxx> wrote: >> On 14/04/14 14:47, Jan Beulich wrote: >>>>>> On 11.04.14 at 19:28, <malcolm.crossley@xxxxxxxxxx> wrote: >>>> Purpose >>>> ======= >>>> >>>> Allow Xen Domain 0 PV guests to create/modify/destroy IOMMU mappings for >>>> hardware devices that Domain 0 has access to. This enables Domain 0 to >>>> program >>>> a bus address space mapping which matches it's GPFN mapping. Once a 1-1 >>>> mapping of GPFN to bus address space is created then a bounce buffer >>>> region is not required for the IO devices connected to the IOMMU. >>> So the idea then is for the domain to >>> - create this 1:1 mapping once at boot, and update it as pages get >>> ballooned int/out, or >>> - zap the entire IOMMU mapping at boot, and establish individual >>> mappings as needed based on the driver's use of the DMA map ops? >>> >> >> Yes both of those are the general idea, I will document this better in >> draft B as per David's suggestion. >> The first strategy has better performance (less IOMMU updates) but lower >> security from bad devices, the second strategy has better security but >> higher IOMMU mapping overhead. >> >> The current idea for maximum performance is to create a 1:1 mapping for >> dom0 GPFN's and an offset (offset by max dom0 GPFN) 1:1 mapping for the >> whole of the MFN address space which will be used for grant mapped >> pages. This should result in IOMMU updates only for ballooned pages. > > That's not fully scalable: You can't cope with systems making use of > the full physical address space (e.g. by sparsely allocating where > each node's memory goes). HP has been having such systems for a > couple of years. > Good point, I was not aware that there were systems which use the top of the physical address space. I propose that Domain 0 can detect the sparse holes via the E820 address map and implement a simple form of address space compression. As long as the amount physical memory in the system is not more than half of the physical memory address space then both a GPFN and "grant map 1:MFN" BFN mapping should be able to fit. >>>> IOMMUOP_unmap_page >>>> ---------------- >>>> First argument, pointer to array of `struct iommu_map_op` >>>> Second argument, integer count of `struct iommu_map_op` elements in array >>>> >>>> This subop will attempt to unmap each element in the `struct >>>> iommu_map_op` array >>>> and record the mapping status back into the array itself. If an >>> unmapping? >> >> Agreed >>> >>>> unmapping fault >>> mapping? >> >> This is an unmap hypercall so I believe it would be an unmapping fault >> reported. > > No - an individual unmap operation failing should result in the > status to be set to -EFAULT (or whatever). The only time you'd > fail the entire hypercall with -EFAULT would be when you can't > read an interface structure array element. Hence if anything this > would be a mapping fault (albeit I didn't like the term in the > IOMMU_map_page description either). > OK, I agree. Would it be cleaner hypercall interface if the hypercall returns 0 if all op's succeeded and a positive count of number of failure ops? The caller would then need to scan the status entry of the array to find the failed op's. Failed op's should hopefully be rare and so the overview of scanning the array should be acceptable. >>>> The IOMMU TLB will only be flushed when the hypercall completes or a >>>> hypercall >>>> continuation is created. >>>> >>>> struct iommu_map_op { >>>> uint64_t bfn; >>>> uint64_t mfn; >>>> uint32_t flags; >>>> int32_t status; >>>> }; >>>> >>>> -------------------------------------------------------------------- >>>> Field Purpose >>>> ----- ----------------------------------------------------- >>>> `bfn` [in] Bus address frame number to be unmapped >>>> >>>> `mfn` [in] This field is ignored for unmap subop >>>> >>>> `flags` [in] This field is ignored for unmap subop >>>> >>>> `status` [out] Mapping status of this unmap operation, 0 indicates >> success >>> Hmm, bfn and flags ignored would call for an abbreviated interface >>> structure. Or how about IOMMU_MAP_OP_{readable,writeable} >>> implying unmap, in which case for now you'd need only one op. And >>> with that I wonder whether this shouldn't simply be a new >>> PHYSDEVOP_iommu_map. >> >> I initially tried to add it to the PHYSDEV hypercall as a subop but when >> adding batch capability it seemed cleaner to use a 3 argument hypercall >> to allow for the count to be passed in as argument rather than as a >> field in a struct. > > Yes, I see - batching really becomes easier that way. > >>>> Security Implications of allowing Domain 0 IOMMU control >>>> ======================================================== >>>> >>>> Xen currently allows IO devices attached to Domain 0 to have direct >>>> access to >>>> the all of the MFN address space (except Xen hypervisor memory regions), >>>> provided the Xen IOMMU option dom0-strict is not enabled. >>>> >>>> The PV IOMMU feature provides the same level of access to MFN address space >>>> and the feature is not enabled when the Xen IOMMU option dom0-strict is >>>> enabled. Therefore security is not affected by the PV IOMMU feature. >>> While I was about to reply with the same request to extend this to >>> driver domains, this last section pretty much excludes such an >>> extension. I guess that would benefit from revisiting. >> >> I agree supporting driver domains would not be straightforward with >> regards to security so I would suggest that driver domain support could >> be added later as a separate design. > > That's odd - you design something anew and want to immediately > postpone one of the obvious potential uses of it? As just said in > another reply, I think you will want to support dom0-strict mode > anyway, and with that I would expect PV driver domain support to > fall out almost naturally. > I was trying to restrict the scope so that the feature could be implemented in smaller chunks. Adding PV driver domains support does not look like a significant extra amount of work so I've proposed a way to support PV driver domains in another reply, I would be very interested in thoughts on technique I've proposed. > Jan > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.