[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 4/4] x86/iommu: add reserved dom0-iommu option to map reserved memory ranges



> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf
> Of Roger Pau Monne
> Sent: 07 August 2018 15:03
> To: xen-devel@xxxxxxxxxxxxxxxxxxxx
> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>; Wei Liu
> <wei.liu2@xxxxxxxxxx>; George Dunlap <George.Dunlap@xxxxxxxxxx>;
> Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>; Ian Jackson
> <Ian.Jackson@xxxxxxxxxx>; Tim (Xen.org) <tim@xxxxxxx>; Julien Grall
> <julien.grall@xxxxxxx>; Jan Beulich <jbeulich@xxxxxxxx>; Roger Pau
> Monne <roger.pau@xxxxxxxxxx>
> Subject: [Xen-devel] [PATCH v3 4/4] x86/iommu: add reserved dom0-iommu
> option to map reserved memory ranges
> 
> Several people have reported hardware issues (malfunctioning USB
> controllers) due to iommu page faults on Intel hardware. Those faults
> are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
> be worked around on VTd hardware by manually adding RMRR entries on
> the command line, this is however limited to Intel hardware and quite
> cumbersome to do.
> 
> In order to solve those issues add a new dom0-iommu=reserved option
> that identity maps all regions marked as reserved in the memory map.
> Note that regions used by devices emulated by Xen (LAPIC, IO-APIC or
> PCIe MCFG regions) are specifically avoided. Note that this option is
> available to a PVH Dom0 (as opposed to the inclusive option which only
> works for PV Dom0).
> 
> Signed-off-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Reviewed-by: Paul Durrant <paul.durrant@xxxxxxxxxx>

> ---
> Changes since v2:
>  - Fix comment regarding dom0-strict.
>  - Change documentation style of xen command line.
>  - Rename iommu_map to hwdom_iommu_map.
>  - Move all the checks to hwdom_iommu_map.
> 
> Changes since v1:
>  - Introduce a new reserved option instead of abusing the inclusive
>    option.
>  - Use the same helper function for PV and PVH in order to decide if a
>    page should be added to the domain page tables.
>  - Use the data inside of the domain struct to detect overlaps with
>    emulated MMIO regions.
> ---
> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Cc: George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
> Cc: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
> Cc: Jan Beulich <jbeulich@xxxxxxxx>
> Cc: Julien Grall <julien.grall@xxxxxxx>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
> Cc: Tim Deegan <tim@xxxxxxx>
> Cc: Wei Liu <wei.liu2@xxxxxxxxxx>
> ---
>  docs/misc/xen-command-line.markdown         | 11 ++-
>  xen/arch/x86/hvm/io.c                       |  5 ++
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 +
>  xen/drivers/passthrough/iommu.c             |  3 +
>  xen/drivers/passthrough/vtd/iommu.c         |  3 +
>  xen/drivers/passthrough/x86/iommu.c         | 86 ++++++++++++++-------
>  xen/include/asm-x86/hvm/io.h                |  3 +
>  xen/include/xen/iommu.h                     |  2 +-
>  8 files changed, 85 insertions(+), 31 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-
> command-line.markdown
> index 90b32fe3f0..59ec2afc5d 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1205,7 +1205,7 @@ detection of systems known to misbehave upon
> accesses to that port.
>  >> Enable IOMMU debugging code (implies `verbose`).
> 
>  ### dom0-iommu
> -> `= List of [ none | strict | relaxed | inclusive ]`
> +> `= List of [ none | strict | relaxed | inclusive | reserved ]`
> 
>  * `none`: disables DMA remapping for Dom0.
> 
> @@ -1233,6 +1233,15 @@ meaning:
>    option is only applicable to a PV Dom0 and is enabled by default on Intel
>    hardware.
> 
> +* `reserved`: sets up DMA remapping for all the reserved regions in the
> memory
> +  map for Dom0. Use this to work around firmware issues providing incorrect
> +  RMRR/IVMD entries. Rather than only mapping RAM pages for IOMMU
> accesses
> +  for Dom0, all memory regions marked as reserved in the memory map that
> don't
> +  overlap with any MMIO region from emulated devices will be identity
> mapped.
> +  This option maps a subset of the memory that would be mapped when
> using the
> +  `inclusive` option. This option is available to a PVH Dom0 and is enabled 
> by
> +  default on Intel hardware.
> +
>  ### iommu\_dev\_iotlb\_timeout
>  > `= <integer>`
> 
> diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
> index bf4d8748d3..5e01c33890 100644
> --- a/xen/arch/x86/hvm/io.c
> +++ b/xen/arch/x86/hvm/io.c
> @@ -404,6 +404,11 @@ static const struct hvm_mmcfg
> *vpci_mmcfg_find(const struct domain *d,
>      return NULL;
>  }
> 
> +bool vpci_mmcfg_address(const struct domain *d, paddr_t addr)
> +{
> +    return vpci_mmcfg_find(d, addr);
> +}
> +
>  static unsigned int vpci_mmcfg_decode_addr(const struct hvm_mmcfg
> *mmcfg,
>                                             paddr_t addr, pci_sbdf_t *sbdf)
>  {
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 0e0c99c942..2c2867d088 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -256,6 +256,9 @@ static void __hwdom_init
> amd_iommu_hwdom_init(struct domain *d)
>      /* Inclusive IOMMU mappings are disabled by default on AMD hardware.
> */
>      iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ? false
>                                                        : iommu_dom0_inclusive;
> +    /* Reserved IOMMU mappings are disabled by default on AMD
> hardware. */
> +    iommu_dom0_reserved = iommu_dom0_reserved == -1 ? false
> +                                                    : iommu_dom0_reserved;
> 
>      if ( allocate_domain_resources(dom_iommu(d)) )
>          BUG();
> diff --git a/xen/drivers/passthrough/iommu.c
> b/xen/drivers/passthrough/iommu.c
> index f15c94be42..9c991bd2cf 100644
> --- a/xen/drivers/passthrough/iommu.c
> +++ b/xen/drivers/passthrough/iommu.c
> @@ -75,6 +75,7 @@ custom_param("dom0-iommu",
> parse_dom0_iommu_param);
>  bool __hwdom_initdata iommu_dom0_strict;
>  bool __read_mostly iommu_dom0_passthrough;
>  int8_t __hwdom_initdata iommu_dom0_inclusive = -1;
> +int8_t __hwdom_initdata iommu_dom0_reserved = -1;
> 
>  DEFINE_PER_CPU(bool_t, iommu_dont_flush_iotlb);
> 
> @@ -162,6 +163,8 @@ static int __init parse_dom0_iommu_param(const
> char *s)
>              iommu_dom0_strict = false;
>          else if ( !strncmp(s, "inclusive", ss - s) )
>              iommu_dom0_inclusive = val;
> +        else if ( !strncmp(s, "reserved", ss - s) )
> +            iommu_dom0_reserved = val;
>          else
>              rc = -EINVAL;
> 
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 7c7e15755d..77a076215b 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1307,6 +1307,9 @@ static void __hwdom_init
> intel_iommu_hwdom_init(struct domain *d)
>      /* Inclusive mappings are enabled by default on Intel hardware for PV. */
>      iommu_dom0_inclusive = iommu_dom0_inclusive == -1 ?
> is_pv_domain(d)
>                                                        : iommu_dom0_inclusive;
> +    /* Reserved IOMMU mappings are enabled by default on Intel hardware.
> */
> +    iommu_dom0_reserved = iommu_dom0_reserved == -1 ? true
> +                                                    : iommu_dom0_reserved;
> 
>      setup_hwdom_pci_devices(d, setup_hwdom_device);
>      setup_hwdom_rmrr(d);
> diff --git a/xen/drivers/passthrough/x86/iommu.c
> b/xen/drivers/passthrough/x86/iommu.c
> index 5a7a765e9d..6aec43ed1a 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -20,6 +20,7 @@
>  #include <xen/softirq.h>
>  #include <xsm/xsm.h>
> 
> +#include <asm/hvm/io.h>
>  #include <asm/setup.h>
> 
>  void iommu_update_ire_from_apic(
> @@ -134,13 +135,67 @@ void arch_iommu_domain_destroy(struct domain
> *d)
>  {
>  }
> 
> +static bool __hwdom_init hwdom_iommu_map(const struct domain *d,
> unsigned long pfn,
> +                                         unsigned long max_pfn)
> +{
> +    unsigned int i;
> +
> +    /*
> +     * Ignore any address below 1MB, that's already identity mapped by the
> +     * domain builder for HVM.
> +     */
> +    if ( (is_hvm_domain(d) && pfn < PFN_DOWN(MB(1))) ||
> +         /* Exclude Xen bits. */
> +         xen_in_range(pfn) || (pfn > max_pfn && !mfn_valid(_mfn(pfn))) )
> +        return false;
> +
> +    /*
> +     * If dom0-strict mode is enabled or the guest type is PVH/HVM then
> exclude
> +     * conventional RAM and let the common code map dom0's pages.
> +     */
> +    if ( page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> +         (iommu_dom0_strict || is_hvm_domain(d)) )
> +        return false;
> +    if ( page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> +         !iommu_dom0_reserved && !iommu_dom0_inclusive )
> +        return false;
> +    if ( !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE) &&
> +         !page_is_ram_type(pfn, RAM_TYPE_RESERVED) &&
> +         !page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) &&
> +         (!iommu_dom0_inclusive || pfn > max_pfn) )
> +        return false;
> +
> +    /* Check that it doesn't overlap with the LAPIC */
> +    if ( has_vlapic(d) )
> +    {
> +        const struct vcpu *v;
> +
> +        for_each_vcpu(d, v)
> +            if ( pfn == PFN_DOWN(vlapic_base_address(vcpu_vlapic(v))) )
> +                return false;
> +    }
> +    /* ... or the IO-APIC */
> +    for ( i = 0; has_vioapic(d) && i < d->arch.hvm_domain.nr_vioapics; i++ )
> +        if ( pfn == PFN_DOWN(domain_vioapic(d, i)->base_address) )
> +            return false;
> +    /*
> +     * ... or the PCIe MCFG regions.
> +     * TODO: runtime added MMCFG regions are not checked to make sure
> they
> +     * don't overlap with already mapped regions, thus preventing trapping.
> +     */
> +    if ( has_vpci(d) && vpci_mmcfg_address(d, pfn << PAGE_SHIFT) )
> +        return false;
> +
> +    return true;
> +}
> +
>  void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
>  {
>      unsigned long i, top, max_pfn;
> 
>      BUG_ON(!is_hardware_domain(d));
> 
> -    if ( iommu_dom0_passthrough || !is_pv_domain(d) )
> +    if ( iommu_dom0_passthrough )
>          return;
> 
>      max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
> @@ -149,36 +204,9 @@ void __hwdom_init arch_iommu_hwdom_init(struct
> domain *d)
>      for ( i = 0; i < top; i++ )
>      {
>          unsigned long pfn = pdx_to_pfn(i);
> -        bool map;
>          int rc;
> 
> -        /*
> -         * Set up 1:1 mapping for dom0. Default to include only
> -         * conventional RAM areas and let RMRRs include needed reserved
> -         * regions. When set, the inclusive mapping additionally maps in
> -         * every pfn up to 4GB except those that fall in unusable ranges.
> -         */
> -        if ( pfn > max_pfn && !mfn_valid(_mfn(pfn)) )
> -            continue;
> -
> -        if ( iommu_dom0_inclusive && pfn <= max_pfn )
> -            map = !page_is_ram_type(pfn, RAM_TYPE_UNUSABLE);
> -        else
> -            map = page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL);
> -
> -        if ( !map )
> -            continue;
> -
> -        /* Exclude Xen bits */
> -        if ( xen_in_range(pfn) )
> -            continue;
> -
> -        /*
> -         * If dom0-strict mode is enabled then exclude conventional RAM
> -         * and let the common code map dom0's pages.
> -         */
> -        if ( iommu_dom0_strict &&
> -             page_is_ram_type(pfn, RAM_TYPE_CONVENTIONAL) )
> +        if ( !hwdom_iommu_map(d, pfn, max_pfn) )
>              continue;
> 
>          rc = iommu_map_page(d, pfn, pfn,
> IOMMUF_readable|IOMMUF_writable);
> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
> index e6b6ed0b92..8cca456b55 100644
> --- a/xen/include/asm-x86/hvm/io.h
> +++ b/xen/include/asm-x86/hvm/io.h
> @@ -180,6 +180,9 @@ int register_vpci_mmcfg_handler(struct domain *d,
> paddr_t addr,
>  /* Destroy tracked MMCFG areas. */
>  void destroy_vpci_mmcfg(struct domain *d);
> 
> +/* Check if an address is between a MMCFG region for a domain. */
> +bool vpci_mmcfg_address(const struct domain *d, paddr_t addr);
> +
>  #endif /* __ASM_X86_HVM_IO_H__ */
> 
> 
> diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
> index 99e5b89c0f..fed1b1ea7a 100644
> --- a/xen/include/xen/iommu.h
> +++ b/xen/include/xen/iommu.h
> @@ -37,7 +37,7 @@ extern bool_t iommu_debug;
>  extern bool_t amd_iommu_perdev_intremap;
> 
>  extern bool iommu_dom0_strict, iommu_dom0_passthrough;
> -extern int8_t iommu_dom0_inclusive;
> +extern int8_t iommu_dom0_inclusive, iommu_dom0_reserved;
> 
>  extern unsigned int iommu_dev_iotlb_timeout;
> 
> --
> 2.18.0
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxxx
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.