[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen PV IOMMU interface draft D
Hi, Malcolm, Not sure whether I missed your reply or not, but failed to find it in my archive. Could you help re-post if you already did so? Sorry that my comments might be a bit late which didn't catch previous draft discussions, but some of below questions are really important to help us understand how this new interface works with XenGT... Thanks Kevin > From: Tian, Kevin > Sent: Thursday, February 18, 2016 4:21 PM > > > From: Malcolm Crossley [mailto:malcolm.crossley@xxxxxxxxxx] > > Sent: Wednesday, February 10, 2016 6:09 PM > > As Konrad commented, it's better to add this doc as 1st patch in your series > then it's easier to review it with other patches together. Also it's always > good to include such design doc in the repo. > > Other comments embedded. > > [...] > > > > Clarification of GFN and BFN fields for different guest types > > ------------------------------------------------------------- > > > [...] > > Bus Frame Numbers (BFN) refer to the address presented on the physical bus > > before being translated by the IOMMU. > > > > Diagram below details memory accesses originating from physical device. > > > > Physical Device > > | > > (BFN) > > | > > IOMMU-PT > > | > > (MFN) > > | > > RAM > > Curious what IOMMU-'PT' means here? > > [...] > > General principles for PV IOMMU interface > > ========================================= > > > > There are two different usage models for the BFN address space of a calling > > guest based upon the two purposes specified in the section above. > > > > A calling guest may use their BFN address space for only one of the purposes > > detailed above and so the PV IOMMU interface has a subop per usage model. > > Furthermore, the IOMMU mapping of foreign domains memory is more complex > > than > > IOMMU mapping local domain memory and seperating the subops allows for the > > complexity to be split in the implementation. > > > > The PV IOMMU design allows the calling domain to control it's BFN memory > > map. > > Thus the design also assigns the responsiblity of ensuring a BFN address > > mapped for local domain memory mappings are not reused for foreign domain > > memory mappings without an explict unmap of BFN address first. This > > simplifies > > the usage of the API and the extra overhead for the calling domains should > > be > > minimal as they should be already tracking the BFN address space usage > > already. > > It might be clearer if you can add a separate section for BFN itself, i.e. > how it is managed/allocated in different scenarios. I know most info is > already provided in this text, but not centralized so far. :-) > > > > > > > Emulator usage of PV IOMMU interface > > ==================================== > > I'd suggest moving this and later sections to behind basic API introduction. > Otherwise insufficient background on so many API references at this point. > > > > > Emulators which require bus address mapping of guest RAM must first > > determine if > > it's possible for the domain to control the bus addresses themselves. > > > > A IOMMUOP_query_caps subop will return the IOMMU_QUERY_map_cap flag. If this > > flag is set then the emulator may specify the BFN address it wishes guest > > RAM to > > be mapped to via the IOMMUOP_map_foreign_page subop. If the flag is not set > > then the emulator must use BFN addresses supplied by the Xen via the > > IOMMUOP_lookup_foreign_page. > > IOMMU_QUERY_map_cap is a bit confusing here. Above paragraph is about > whether emulator is allowed to allocate/specify BFN itself. However this > capability name is more read as whether the calling domain can map foreign > pages which is actually true regardless of how BFN is allocated. > > > > > Operating systems which use the IOMMUOP_map_page subop are expected to > > provide > a > > common interface for emulators to use. Otherwise emulators will not be aware > > of existing BFN mappings created by operating system and will get failed > > subops due to conflicts in the BFN address space for the domain. > > Do you mean that emulator needs to detect whether OS is using > IOMMUOP_map_page? If yes, then emulator calls a common interface > provided by OS. If not, then emulator just directly invoke raw IOMMUOP > itself. I'm not certain whether there is common mechanism to detect > this so far. Could you elaborate your thought here? > > > > > Emulators should unmap unused GFN mappings as often as possible using > > IOMMUOP_unmap_foreign_page subops so that guest domains can balloon pages > > quickly and efficiently. > > Following earlier analysis then this only applies when OS doesn't use IOMMUOP. > Otherwise emulator needs call a 'OS common interface' right? > > > > > Emulators should conform to the ballooning behaviour described section > > "IOMMUOP_*_foreign_page interactions with guest domain ballooning" so that > > guest > > domains are able to effectively balloon out and in memory. > > > > Emulators must unmap any active BFN mappings when they shutdown. > > > > IOMMUOP_*_foreign_page interactions with guest domain ballooning > > > ===================================================== > > =========== > > > > Guest domains can balloon out a set of GFN mappings at any time and render > > the > > BFN to GFN mapping invalid. > > > > When a BFN to GFN mapping becomes invalid, Xen will issue a buffered I/O > > request > > of type IOREQ_TYPE_INVALIDATE to the affected IOREQ servers with the now > > invalid > > BFN address in the data field. If the buffered I/O request ring is full > > then a > > standard (synchronous) I/O request of type IOREQ_TYPE_INVALIDATE will be > > issued > > to the affected IOREQ server the with just invalidated BFN address in the > > data > > field. > > > > The BFN mappings cannot be simply unmapped at the point of the balloon > > hypercall > > otherwise a malicious guest could specifically balloon out an in use GFN > > address > > in use by an emulator and trigger IOMMU faults for the domains with BFN > > mappings. > > Is it a real problem? Today for PCI passthru, what will happen if guest > programs > assigned device with a bad GPA which is not mapped in IOMMU? I think IOMMU > fault should be fine, and we can just leverage existing IOMMU fault handling > after > the fault is triggered. > > > > > For hosts with no IOMMU support: The affected emulator(s) must specifically > > issue a IOMMUOP_unmap_foreign_page subop for the now invalid BFN address so > > that > > the references to the underlying MFN are removed and the MFN can be freed > > back > > to the Xen memory allocator. > > > > For hosts with IOMMU support: > > If the BFN was mapped without the IOMMUOP_swap_mfn flag set in the > > IOMMUOP_map_foreign_page then the affected affected emulator(s) must > > specifically issue a IOMMUOP_unmap_foreign_page subop for the now invalid > > BFN > > address so that the references to the underlying MFN are removed. > > > > If the BFN was mapped with the IOMMUOP_swap_mfn flag set in the > > IOMMUOP_map_foreign_page subop for all emulators with mappings of that GFN > > then > > the BFN mapping will be swapped to point at a scratch MFN page and all BFN > > references to the invalid MFN will be removed by Xen after the BFN mapping > > has > > been updated to point at the scratch MFN page. > > I don't understand why for 'swap' case you don't need emulator to do > explicit unmap. You can think 'noswap' (page-A to invalid) as a special > example of 'swap' (page-A to scratch page), since they both move > away from page-A reference. If there is a reason that emulator needs > to do some cleanup internally before dropping the reference, does > 'swap_mfn' breaks that situation then? > > > > > The rationale for swapping the BFN mapping to point at scratch pages is to > > enable guest domains to balloon quickly without requiring hypercall(s) from > > emulators. > > > > Not all BFN mappings can be swapped without potentially causing problems > > for the > > hardware itself (command rings etc.) so the IOMMUOP_swap_mfn flag is used to > > allow per BFN control of Xen ballooning behaviour. > > Who will judge whether a BFN mapping can be swapped then? > > [...] > > Xen PV IOMMU hypercall interface > > -------------------------------- > > A two argument hypercall interface (do_iommu_op). > > > > ret_t do_iommu_op(XEN_GUEST_HANDLE_PARAM(void) arg, unsigned int count) > > > > First argument, guest handle pointer to array of `struct pv_iommu_op` > > > > Second argument, unsigned integer count of `struct pv_iommu_op` elements in > > array. > > > > Definition of `struct pv_iommu_op`: > > > > struct pv_iommu_op { > > > > uint16_t subop_id; > > uint16_t flags; > > int32_t status; > > > > union { > > struct { > > uint64_t bfn; > > uint64_t gfn; > > } map_page; > > > > struct { > > uint64_t bfn; > > } unmap_page; > > > > struct { > > uint64_t bfn; > > uint64_t gfn; > > uint16_t domid; > > ioservid_t ioserver; > > } map_foreign_page; > > > > struct { > > uint64_t bfn; > > uint64_t gfn; > > uint16_t domid; > > ioservid_t ioserver; > > } lookup_foreign_page; > > > > struct { > > uint64_t bfn; > > ioservid_t ioserver; > > } unmap_foreign_page; > > } u; > > }; > > Do we really need such ioserver ID here? Could it be simple > as looping all ioreq servers with INVALIDATE notifications? > > > [...] > > > > IOMMUOP_map_page > > ---------------------- > > This subop uses `struct map_page` part of the `struct pv_iommu_op`. > > > > If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be > > allowed to map all GFNs except for Xen owned MFNs else the hardware > > domain will only be allowed to map GFNs which it owns. > > "map all GFNs" -> "map all MFNs" since you use "except for Xen owned MFNs" > later. Since you have a capability called IOMMU_QUERY_map_all_mfns, should > you add such condition in above description? > > > > > If IOMMU dom0-strict mode is NOT enabled then the hardware domain will be > > allowed to map all GFNs without taking a reference to the MFN backing the > > GFN > > by setting the IOMMU_MAP_OP_no_ref_cnt flag. > > could you elaborate when no_ref_cnt is required? > > [...] > > > > IOMMUOP_unmap_page > > ------------------ > > This subop uses `struct unmap_page` part of the `struct pv_iommu_op`. > > > > The subop usage of the `struct pv_iommu_op` and `struct unmap_page` fields > > are detailed below: > > > > -------------------------------------------------------------------- > > Field Purpose > > ----- ----------------------------------------------------- > > `bfn` [in] Bus address frame number to be unmapped in DOMID_SELF > > > > `flags` [in] Flags for signalling page order of unmap operation > > > > `status` [out] Mapping status of this unmap operation, 0 indicates > > success > > -------------------------------------------------------------------- > > > > Defined bits for flags field: > > > > Name Bit Definition > > ---- ----- ---------------------------------- > > IOMMU_UNMAP_OP_remove_m2b 0 Wildcard M2B mapping removed for > > lookup_foreign_page use > > Is it explicitly required? Should it be implicit as long as a valid M2B entry > existing? > > > [...] > > IOMMUOP_map_foreign_page > > ------------------------ > > This subop uses `struct map_foreign_page` part of the `struct pv_iommu_op`. > > > > It is not valid to use a domid representing the calling domain. > > Then what's being used here to represent the calling domain? > > > > > The hypercall will only succeed if calling domain has sufficient privilege > > over > > the specified domid. > > How is this privilege check being done? Is there existing mechanism, or > something > new to add? > > > > > The M2B mechanism is an MFN to (BFN,domid,ioserver) tuple. > > > > Each successful subop will add to the M2B if there was not an existing > > identical > > M2B entry. > > > > Every new M2B entry will take a reference to the MFN backing the GFN. > > > > All the following conditions are required to be true for PV IOMMU > > map_foreign > > subop to succeed: > > > > 1. IOMMU detected and supported by Xen > > 2. The domain has IOMMU controlled hardware allocated to it > > 3. The domain is the hardware_domain and the following Xen IOMMU options are > > NOT enabled: dom0-passthrough > > 4. the domain has sufficient privilege over the specified domid; > > [...] > > > > IOMMU_lookup_foreign_page > > ------------------------- > > This subop uses `struct lookup_foreign_page` part of the `struct > > pv_iommu_op`. > > > > This subop lookups up a BFN mapping for a ioserver + gfn + target domid > > combination. > > > > The hypercall will only succeed if calling domain has sufficient privilege > > over > > the specified domid. > > > > If a 1:1 mapping exists of BFN to MFN then a M2B entry is added and a > > reference is taken to the underlying MFN. If an existing mapping is present > > Then when will this very reference be dropped? > > > then the BFN is returned and no additional reference's will be taken to the > > underlying MFN. > > > > A 1:1 mapping will exist if there is no IOMMU support or if the PV hardware > > domain was booted in dom0-relaxed mode or in dom0-passthrough mode. > > what about hardware domain using IOMMUOPS in the meantime? In that > case, from your earlier description it's hardware domain to manage BFN > addr space, while here 1:1 mapping is some hard assumption in hypervisor, > so two things together may conflict. There needs to be a mechanism > that once Xen sees any explicit BFN passed from hardware domain, then > such 1:1 mapping scheme should be disabled. > > > > > If there is no IOMMU support then the MFN is returned in the BFN field > > (that is > > the only valid bus address for the GFN + domid combination). > > > > [...] > > > > Linux kernel architecture > > ========================= > > > > The Linux kernel will use the PV-IOMMU hypercalls to map its PFN address > > space into the IOMMU. It will map the PFNs to the IOMMU address space using > > a 1:1 mapping, it does this by programming a BFN to GFN mapping which > > matches > > the PFN to GFN mapping. > > > > The native SWIOTLB will be used to handle devices which cannot DMA to all of > > the kernel's PFN address space. > > > > An interface shall be provided for emulator usage of IOMMUOP_*_foreign_page > > subops which will allow the Linux kernel to centrally manage that domain's > > BFN > > resource and ensure there are no unexpected conflicts. > > One open here. When IOMMU is enabled, there is supposed to be a > IOVA space created in Linux kernel. How does this BFN space play > with that one? > > Thanks > Kevin _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |