[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges



>>> On 08.08.18 at 12:07, <roger.pau@xxxxxxxxxx> wrote:
> Several people have reported hardware issues (malfunctioning USB
> controllers) due to iommu page faults on Intel hardware. Those faults
> are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
> be worked around on VTd hardware by manually adding RMRR entries on
> the command line, this is however limited to Intel hardware and quite
> cumbersome to do.
> 
> In order to solve those issues add a new dom0-iommu=reserved option
> that identity maps all regions marked as reserved in the memory map.

Considering that report we've had (yesterday? earlier today?), don't
we need to go further and use the union of RMRRs and reserved
regions? Iirc they had a case where an RMRR was not in a reserved
region ...

> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon accesses 
> to that port.
>  >> Enable IOMMU debugging code (implies `verbose`).
>  
>  ### dom0-iommu
> -> `= List of [ none | strict | relaxed | inclusive ]`
> +> `= List of [ none | strict | relaxed | inclusive | reserved ]`
>  
>  * `none`: disables DMA remapping for Dom0.
>  
> @@ -1233,6 +1233,15 @@ meaning:
>    option is only applicable to a PV Dom0 and is enabled by default on Intel
>    hardware.
>  
> +* `reserved`: sets up DMA remapping for all the reserved regions in the  
> memory
> +  map for Dom0. Use this to work around firmware issues providing incorrect
> +  RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU accesses
> +  for Dom0, all memory regions marked as reserved in the memory map that 
> don't
> +  overlap with any MMIO region from emulated devices will be identity mapped.
> +  This option maps a subset of the memory that would be mapped when using the
> +  `inclusive` option. This option is available to a PVH Dom0 and is enabled 
> by
> +  default on Intel hardware.

The sub-options so far were quite clear in their meanings, but
"dom0-iommu=reserved" might mean all sorts of things to me, but quite
certainly not "map all reserved regions". "map-reserved" perhaps?

> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -404,6 +404,11 @@ static const struct hvm_mmcfg *vpci_mmcfg_find(const 
> struct domain *d,
>      return NULL;
>  }
>  
> +bool vpci_mmcfg_address(const struct domain *d, paddr_t addr)
> +{
> +    return vpci_mmcfg_find(d, addr);
> +}

I think the function name should have an "is_" somewhere, to make
clear address is a noun here and not a verb.

> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -256,6 +256,9 @@ static void __hwdom_init amd_iommu_hwdom_init(struct 
> domain *d)
>      /* Inclusive IOMMU mappings are disabled by default on AMD hardware. */
>      iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? false
>                                                        : iommu_dom0_inclusive;
> +    /* Reserved IOMMU mappings are disabled by default on AMD hardware. */
> +    iommu_dom0_reserved = iommu_dom0_reserved == -1 ? false
> +                                                    : iommu_dom0_reserved;

Especially seeing these two together now, I think an if() each would
be easier to read (not the least because of being shorter). To me,
the main purpose of the conditional operator is to allow shortening
simple if/else pairs, rather than lengthening simple if()-s.

> @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain *d)
>  {
>  }
>  
> +static bool __hwdom_init hwdom_iommu_map(const struct domain *d, unsigned 
> long pfn,
> +                                         unsigned long max_pfn)
> +{
> +    unsigned int i;
> +
> +    /*
> +     * Ignore any address below 1MB, that's already identity mapped by the
> +     * domain builder for HVM.
> +     */
> +    if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||

Careful again here with the distinction between Dom0 and hwdom:
The domain builder you refer to is - aiui - the in-Xen one, i.e. the
one _only_ dealing with Dom0.

> +    /*
> +     * If dom0-strict mode is enabled or the guest type is PVH/HVM then 
> exclude
> +     * conventional RAM and let the common code map dom0's pages.
> +     */
> +    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> +         (iommu_dom0_strict || is_hvm_domain(d)) )
> +        return false;
> +    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> +         !iommu_dom0_reserved && !iommu_dom0_inclusive )
> +        return false;
> +    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
> +         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> +         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> +         (!iommu_dom0_inclusive || pfn > max_pfn) )
> +        return false;

As page_is_ram_type() is, especially on systems with many E820
entries, not really cheap, I think at least a minimum amount of
optimization is on order here - after all you do this for every
single page of the system. Hence minimally the result of the first
CONVENTIONAL and RESERVED queries should be latched and
re-used (or "else" be used suitably). Ideally an approach would
be used which involved just a single iteration through the E820
map, but I realize this may be more than is feasible here.

Furthermore I'm unconvinced the !page_is_ram_type() uses
are really what you want: The function returning false means
"don't know", not "no". Perhaps the function needs to be made
return a tristate (yes, no, or part of it).

> +    /* Check that it doesn't overlap with the LAPIC */
> +    if ( has_vlapic(d) )
> +    {
> +        const struct vcpu *v;
> +
> +        for_each_vcpu(d, v)
> +            if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
> +                return false;
> +    }

I don't, btw, recall any code adjusting the IOMMU mappings in
case the domain relocates its LAPIC. If you do the check above,
wouldn't that other side too need taking care of?

> +    /* ... or the IO-APIC */
> +    for ( i = 0; has_vioapic(d) && i < d->arch.hvm_domain.nr_vioapics; i++ )
> +        if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
> +            return false;
> +    /*
> +     * ... or the PCIe MCFG regions.
> +     * TODO: runtime added MMCFG regions are not checked to make sure they
> +     * don't overlap with already mapped regions, thus preventing trapping.
> +     */
> +    if ( has_vpci(d) && vpci_mmcfg_address(d, pfn << PAGE_SHIFT) )

ITYM pfn_to_paddr() here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.