[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges



>>> On 08.08.18 at 18:18, <roger.pau@xxxxxxxxxx> wrote:
> On Wed, Aug 08, 2018 at 07:15:54AM -0600, Jan Beulich wrote:
>> >>> On 08.08.18 at 12:07, <roger.pau@xxxxxxxxxx> wrote:
>> > Several people have reported hardware issues (malfunctioning USB
>> > controllers) due to iommu page faults on Intel hardware. Those faults
>> > are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
>> > be worked around on VTd hardware by manually adding RMRR entries on
>> > the command line, this is however limited to Intel hardware and quite
>> > cumbersome to do.
>> > 
>> > In order to solve those issues add a new dom0-iommu=reserved option
>> > that identity maps all regions marked as reserved in the memory map.
>> 
>> Considering that report we've had (yesterday? earlier today?), don't
>> we need to go further and use the union of RMRRs and reserved
>> regions? Iirc they had a case where an RMRR was not in a reserved
>> region ...
> 
> AFAICT (and I could be reading the code wrong) RMRR regions not on
> reserved regions are still correctly mapped to the guest.

Oh, yes, you're right - the logged message is really just that,
with no other enforced effects.

>> > --- a/docs/misc/xen-command-line.markdown
>> > +++ b/docs/misc/xen-command-line.markdown
>> > @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon 
>> > accesses 
>> > to that port.
>> >  >> Enable IOMMU debugging code (implies `verbose`).
>> >  
>> >  ### dom0-iommu
>> > -> `= List of [ none | strict | relaxed | inclusive ]`
>> > +> `= List of [ none | strict | relaxed | inclusive | reserved ]`
>> >  
>> >  * `none`: disables DMA remapping for Dom0.
>> >  
>> > @@ -1233,6 +1233,15 @@ meaning:
>> >    option is only applicable to a PV Dom0 and is enabled by default on 
>> > Intel
>> >    hardware.
>> >  
>> > +* `reserved`: sets up DMA remapping for all the reserved regions in the  
>> > memory
>> > +  map for Dom0. Use this to work around firmware issues providing 
>> > incorrect
>> > +  RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses
>> > +  for Dom0, all memory regions marked as reserved in the memory map that 
>> > don't
>> > +  overlap with any MMIO region from emulated devices will be identity 
>> > mapped.
>> > +  This option maps a subset of the memory that would be mapped when using 
>> > the
>> > +  `inclusive` option. This option is available to a PVH Dom0 and is 
>> > enabled by
>> > +  default on Intel hardware.
>> 
>> The sub-options so far were quite clear in their meanings, but
>> "dom0-iommu=reserved" might mean all sorts of things to me, but quite
>> certainly not "map all reserved regions". "map-reserved" perhaps?
> 
> Then we should also have 'map-inclusive' for symmetry IMO.

I don't mind at all such symmetry to be established.

>> > @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d)
>> >  {
>> >  }
>> >  
>> > +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned 
>> > long pfn,
>> > +                                         unsigned long max_pfn)
>> > +{
>> > +    unsigned int i;
>> > +
>> > +    /*
>> > +     * Ignore any address below 1MB, that's already identity mapped by the
>> > +     * domain builder for HVM.
>> > +     */
>> > +    if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
>> 
>> Careful again here with the distinction between Dom0 and hwdom:
>> The domain builder you refer to is - aiui - the in-Xen one, i.e. the
>> one _only_ dealing with Dom0.
> 
> So this should instead be:
> 
> if ( (is_control_domain(d) && is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||

From an abstract perspective (long term plans) is_control_domain()
can be true for multiple domains, none of which may be Dom0 or
hwdom. So no, I don't think the addition would help in any way.
With the reference to the in-Xen domain builder, I think you really
mean Dom0 here.

>> > +    /*
>> > +     * If dom0-strict mode is enabled or the guest type is PVH/HVM then 
>> > exclude
>> > +     * conventional RAM and let the common code map dom0's pages.
>> > +     */
>> > +    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
>> > +         (iommu_dom0_strict || is_hvm_domain(d)) )
>> > +        return false;
>> > +    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
>> > +         !iommu_dom0_reserved && !iommu_dom0_inclusive )
>> > +        return false;
>> > +    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
>> > +         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
>> > +         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
>> > +         (!iommu_dom0_inclusive || pfn > max_pfn) )
>> > +        return false;
>> 
>> As page_is_ram_type() is, especially on systems with many E820
>> entries, not really cheap, I think at least a minimum amount of
>> optimization is on order here - after all you do this for every
>> single page of the system. Hence minimally the result of the first
>> CONVENTIONAL and RESERVED queries should be latched and
>> re-used (or "else" be used suitably). Ideally an approach would
>> be used which involved just a single iteration through the E820
>> map, but I realize this may be more than is feasible here.
> 
> This would be quite better if I could just fetch the type, then I
> could add a switch (type) { ... and it would be better IMO.

Except that, as said, there's no "the" type for an entire page.
Only a single byte can have an exact type.

>> Furthermore I'm unconvinced the !page_is_ram_type() uses
>> are really what you want: The function returning false means
>> "don't know", not "no". Perhaps the function needs to be made
>> return a tristate (yes, no, or part of it).
> 
> Right, that's why I have three different !page_is_ram_type checks in
> the last branch of the if, so that I can make sure it's not one of the
> previous types and also account for holes.

I'm afraid I don't understand. Take the example of a single
page being split in an unusable and a reserved part. Both
respective function invocations will return false. Yet you
want to exclude both unusable and reserved types when
!iommu_dom0_inclusive, and hence your goal would be to
exclude that page here.

As to unusable - don't you break original behavior here
anyway? Shouldn't the function return false when
page_is_ram_type(pfn, RAM_TYPE_UNUSABLE), effectively
independent of any command line options?

>> > +    /* Check that it doesn't overlap with the LAPIC */
>> > +    if ( has_vlapic(d) )
>> > +    {
>> > +        const struct vcpu *v;
>> > +
>> > +        for_each_vcpu(d, v)
>> > +            if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
>> > +                return false;
>> > +    }
>> 
>> I don't, btw, recall any code adjusting the IOMMU mappings in
>> case the domain relocates its LAPIC. If you do the check above,
>> wouldn't that other side too need taking care of?
> 
> Yes. I can add something later but this is already an issue if the
> guest for example relocates the LAPIC over a RAM region.

Well, I'm fine leaving out parts that are currently broken anyway,
but please leave at least a TODO note just like you do for MCFG.
Or (for both) don't do anything here until the situation gets taken
care of uniformly (but presumably that's the worse option).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.