Xen project Mailing List

Re: [Xen-devel] [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges

To: "Roger Pau Monne" <roger.pau@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Thu, 09 Aug 2018 01:36:57 -0600

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 09 Aug 2018 07:37:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

>>> On 08.08.18 at 18:18, <roger.pau@xxxxxxxxxx> wrote: > On Wed, Aug 08, 2018 at 07:15:54AM -0600, Jan Beulich wrote: >> >>> On 08.08.18 at 12:07, <roger.pau@xxxxxxxxxx> wrote: >> > Several people have reported hardware issues (malfunctioning USB >> > controllers) due to iommu page faults on Intel hardware. Those faults >> > are caused by missing RMRR (VTd) entries in the ACPI tables. Those can >> > be worked around on VTd hardware by manually adding RMRR entries on >> > the command line, this is however limited to Intel hardware and quite >> > cumbersome to do. >> > >> > In order to solve those issues add a new dom0-iommu=reserved option >> > that identity maps all regions marked as reserved in the memory map. >> >> Considering that report we've had (yesterday? earlier today?), don't >> we need to go further and use the union of RMRRs and reserved >> regions? Iirc they had a case where an RMRR was not in a reserved >> region ... > > AFAICT (and I could be reading the code wrong) RMRR regions not on > reserved regions are still correctly mapped to the guest. Oh, yes, you're right - the logged message is really just that, with no other enforced effects. >> > --- a/docs/misc/xen-command-line.markdown >> > +++ b/docs/misc/xen-command-line.markdown >> > @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon >> > accesses >> > to that port. >> > >> Enable IOMMU debugging code (implies `verbose`). >> > >> > ### dom0-iommu >> > -> `= List of [ none | strict | relaxed | inclusive ]` >> > +> `= List of [ none | strict | relaxed | inclusive | reserved ]` >> > >> > * `none`: disables DMA remapping for Dom0. >> > >> > @@ -1233,6 +1233,15 @@ meaning: >> > option is only applicable to a PV Dom0 and is enabled by default on >> > Intel >> > hardware. >> > >> > +* `reserved`: sets up DMA remapping for all the reserved regions in the >> > memory >> > + map for Dom0. Use this to work around firmware issues providing >> > incorrect >> > + RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses >> > + for Dom0, all memory regions marked as reserved in the memory map that >> > don't >> > + overlap with any MMIO region from emulated devices will be identity >> > mapped. >> > + This option maps a subset of the memory that would be mapped when using >> > the >> > + `inclusive` option. This option is available to a PVH Dom0 and is >> > enabled by >> > + default on Intel hardware. >> >> The sub-options so far were quite clear in their meanings, but >> "dom0-iommu=reserved" might mean all sorts of things to me, but quite >> certainly not "map all reserved regions". "map-reserved" perhaps? > > Then we should also have 'map-inclusive' for symmetry IMO. I don't mind at all such symmetry to be established. >> > @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d) >> > { >> > } >> > >> > +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned >> > long pfn, >> > + unsigned long max_pfn) >> > +{ >> > + unsigned int i; >> > + >> > + /* >> > + * Ignore any address below 1MB, that's already identity mapped by the >> > + * domain builder for HVM. >> > + */ >> > + if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) || >> >> Careful again here with the distinction between Dom0 and hwdom: >> The domain builder you refer to is - aiui - the in-Xen one, i.e. the >> one _only_ dealing with Dom0. > > So this should instead be: > > if ( (is_control_domain(d) && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) || From an abstract perspective (long term plans) is_control_domain() can be true for multiple domains, none of which may be Dom0 or hwdom. So no, I don't think the addition would help in any way. With the reference to the in-Xen domain builder, I think you really mean Dom0 here. >> > + /* >> > + * If dom0-strict mode is enabled or the guest type is PVH/HVM then >> > exclude >> > + * conventional RAM and let the common code map dom0's pages. >> > + */ >> > + if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) && >> > + (iommu_dom0_strict || is_hvm_domain(d)) ) >> > + return false; >> > + if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) && >> > + !iommu_dom0_reserved && !iommu_dom0_inclusive ) >> > + return false; >> > + if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) && >> > + !page_is_ram_type(pfn, RAM_TYPE_RESERVED) && >> > + !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) && >> > + (!iommu_dom0_inclusive || pfn > max_pfn) ) >> > + return false; >> >> As page_is_ram_type() is, especially on systems with many E820 >> entries, not really cheap, I think at least a minimum amount of >> optimization is on order here - after all you do this for every >> single page of the system. Hence minimally the result of the first >> CONVENTIONAL and RESERVED queries should be latched and >> re-used (or "else" be used suitably). Ideally an approach would >> be used which involved just a single iteration through the E820 >> map, but I realize this may be more than is feasible here. > > This would be quite better if I could just fetch the type, then I > could add a switch (type) { ... and it would be better IMO. Except that, as said, there's no "the" type for an entire page. Only a single byte can have an exact type. >> Furthermore I'm unconvinced the !page_is_ram_type() uses >> are really what you want: The function returning false means >> "don't know", not "no". Perhaps the function needs to be made >> return a tristate (yes, no, or part of it). > > Right, that's why I have three different !page_is_ram_type checks in > the last branch of the if, so that I can make sure it's not one of the > previous types and also account for holes. I'm afraid I don't understand. Take the example of a single page being split in an unusable and a reserved part. Both respective function invocations will return false. Yet you want to exclude both unusable and reserved types when !iommu_dom0_inclusive, and hence your goal would be to exclude that page here. As to unusable - don't you break original behavior here anyway? Shouldn't the function return false when page_is_ram_type(pfn, RAM_TYPE_UNUSABLE), effectively independent of any command line options? >> > + /* Check that it doesn't overlap with the LAPIC */ >> > + if ( has_vlapic(d) ) >> > + { >> > + const struct vcpu *v; >> > + >> > + for_each_vcpu(d, v) >> > + if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) ) >> > + return false; >> > + } >> >> I don't, btw, recall any code adjusting the IOMMU mappings in >> case the domain relocates its LAPIC. If you do the check above, >> wouldn't that other side too need taking care of? > > Yes. I can add something later but this is already an issue if the > guest for example relocates the LAPIC over a RAM region. Well, I'm fine leaving out parts that are currently broken anyway, but please leave at least a TODO note just like you do for MCFG. Or (for both) don't do anything here until the situation gets taken care of uniformly (but presumably that's the worse option). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.