[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/4] iommu: add rmrr Xen command line option for extra rmrrs



> From: elena.ufimtseva@xxxxxxxxxx [mailto:elena.ufimtseva@xxxxxxxxxx]
> Sent: Saturday, May 30, 2015 5:39 AM
> 
> From: Elena Ufimtseva <elena.ufimtseva@xxxxxxxxxx>
> 
> On some platforms RMRR regions may be not specified
> in ACPI and thus will not be mapped 1:1 in dom0. This
> causes IO Page Faults and prevents dom0 from booting
> in PVH mode.
> New Xen command line option rmrr allows to specify
> such devices and memory regions. These regions are added
> to the list of RMRR defined in ACPI if the device
> is present in system. As a result, additional RMRRs will
> be mapped 1:1 in dom0 with correct permissions.
> 
> Mentioned above problems were discovered during PVH work with
> ThinkCentre M and Dell 5600T. No official documentation
> was found so far in regards to what devices and why cause this.
> Experiments show that ThinkCentre M USB devices with enabled
> debug port generate DMA read transactions to the regions of
> memory marked reserved in host e820 map.
> For Dell 5600T the device and faulting addresses are not found yet.
> 
> For detailed history of the discussion please check following threads:
> http://lists.Xen.org/archives/html/xen-devel/2015-02/msg01724.html
> http://lists.Xen.org/archives/html/xen-devel/2015-01/msg02513.html
> 
> Format for rmrr Xen command line option:
> rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
> If grub2 used and multiple ranges are specified, ';' should be
> quoted/escaped, refer to grub2 manual for more information.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@xxxxxxxxxx>
> ---
>  docs/misc/xen-command-line.markdown |  13 +++
>  xen/drivers/passthrough/vtd/dmar.c  | 164
> +++++++++++++++++++++++++++++++++++-
>  2 files changed, 176 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/misc/xen-command-line.markdown
> b/docs/misc/xen-command-line.markdown
> index 4889e27..26e2a5e 100644
> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -1185,6 +1185,19 @@ Specify the host reboot method.
>  'efi' instructs Xen to reboot using the EFI reboot call (in EFI mode by
>   default it will use that method first).
> 
> +### rmrr
> +> '= 
> start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
> +
> +Define RMRRs units that are missing from ACPI table along with device
> +they belong to and use them for 1:1 mapping. End addresses can be omitted
> +and one page will be mapped. The ranges are inclusive when start and end
> +are specified.If segement of the first device is not specified, segment zero
> will be used.
> +If other segments are not specified, first device segment will be used.
> +If segments are specified for every device and not equal, an error will be
> reported.

Since you only allow devices under same segment for a given rmrr range,
would it be simpler to enforce that explicitly in the format? e.g.:

= start<-end>=[s1]bdf1[,bdf2[,...]];

Then you don't need to verify whether segment in later bdfs is specified and
different from 1st bdf.

> +Note: grub2 requires to escape or use quotations if special
> +characters are used, namely ';', refer to the grub2 documentation if multiple
> +ranges are specified.
> +
>  ### ro-hpet
>  > `= <boolean>`
> 
> diff --git a/xen/drivers/passthrough/vtd/dmar.c
> b/xen/drivers/passthrough/vtd/dmar.c
> index 89a2f79..d675940 100644
> --- a/xen/drivers/passthrough/vtd/dmar.c
> +++ b/xen/drivers/passthrough/vtd/dmar.c
> @@ -866,6 +866,106 @@ out:
>      return ret;
>  }
> 
> +#define MAX_EXTRA_RMRR_PAGES 16
> +#define MAX_EXTRA_RMRR 10
> +
> +/* RMRR units derived from command line rmrr option */
> +#define MAX_EXTRA_RMRR_DEV 20
> +struct extra_rmrr_unit {
> +    struct list_head list;
> +    unsigned long base_pfn, end_pfn;
> +    u16    dev_count;
> +    u32    sbdf[MAX_EXTRA_RMRR_DEV];
> +};
> +static __initdata unsigned int nr_rmrr;
> +static struct __initdata extra_rmrr_unit rmrru[MAX_EXTRA_RMRR];

rmrru is confusing especially when used in multiple functions. Better to
use a more descriptive name similar to acpi_rmrr_units.

> +
> +static void __init add_extra_rmrr(void)
> +{
> +    struct acpi_rmrr_unit *rmrrn;

'rmrrn' -> 'acpi_rmrr'?

> +    unsigned int dev, seg, i, j;
> +    unsigned long pfn;
> +
> +    for ( i = 0; i < nr_rmrr; i++ )
> +    {
> +        if ( rmrru[i].base_pfn > rmrru[i].end_pfn )
> +        {
> +            printk(XENLOG_ERR VTDPREFIX
> +                   "Start pfn > end pfn for RMRR range [%"PRIx64"
> - %"PRIx64"]\n",
> +                   rmrru[i].base_pfn, rmrru[i].end_pfn);
> +            return;

return or continue? other rmrr entry might be good...

> +        }
> +
> +        if ( rmrru[i].end_pfn - rmrru[i].base_pfn >=
> MAX_EXTRA_RMRR_PAGES )
> +        {
> +            printk(XENLOG_ERR VTDPREFIX
> +                   "RMRR range exceeds 16 pages [%"PRIx64"
> - %"PRIx64"]\n",
> +                   rmrru[i].base_pfn, rmrru[i].end_pfn);
> +            return;
> +        }
> +
> +        for ( j = 0; j < nr_rmrr; j++ )
> +        {
> +            if ( i != j && rmrru[i].base_pfn <= rmrru[j].end_pfn &&
> +                 rmrru[j].base_pfn <= rmrru[i].end_pfn )
> +            {
> +                printk(XENLOG_ERR VTDPREFIX
> +                      "Overlapping RMRRs [%"PRIx64",%"PRIx64"] and
> [%"PRIx64",%"PRIx64"]\n",
> +                      rmrru[i].base_pfn, rmrru[i].end_pfn,
> +                      rmrru[j].base_pfn, rmrru[j].end_pfn);
> +               return;
> +            }
> +        }
> +
> +        for ( pfn = rmrru[i].base_pfn; pfn <= rmrru[i].end_pfn; pfn++ )
> +        {
> +            if ( !mfn_valid(pfn) )
> +                if ( iommu_verbose )
> +                    printk(XENLOG_ERR VTDPREFIX
> +                           "Invalid mfn in RMRR range [%"PRIx64"
> - %"PRIx64"]\n",
> +                           rmrru[i].base_pfn, rmrru[i].end_pfn);
> +            return;

the 'return' should be indent under if (!mfn_valid(pfn)), if it's not caused by
outlook... otherwise you always return here in all cases.

> +        }
> +
> +        rmrrn = xzalloc(struct acpi_rmrr_unit);
> +        if ( !rmrrn )
> +            return;
> +
> +        rmrrn->scope.devices = xmalloc_array(u16,
> +                                             rmrru[i].dev_count);
> +        if ( !rmrrn->scope.devices )
> +        {
> +            xfree(rmrrn);
> +            return;
> +        }
> +
> +        seg = 0;
> +        for ( dev = 0; dev < rmrru->dev_count; dev++ )
> +        {
> +            rmrrn->scope.devices[dev] = rmrru->sbdf[dev];
> +            seg = seg | PCI_SEG(rmrru->sbdf[dev] >> 16);
> +        }
> +        if ( seg != PCI_SEG(rmrru->sbdf[0]) )
> +        {
> +            printk(XENLOG_ERR VTDPREFIX
> +                   "Segments are not equal for RMRR range
> [%"PRIx64" - %"PRIx64"]\n",
> +                   rmrru->base_pfn, rmrru->end_pfn);
> +            xfree(rmrrn->scope.devices);
> +            xfree(rmrrn);
> +            continue;
> +        }

This check should be moved earlier along with other checks.

> +        rmrrn->segment = seg;
> +        rmrrn->base_address = rmrru[i].base_pfn << PAGE_SHIFT;
> +        rmrrn->end_address = (rmrru[i].end_pfn + 1) << PAGE_SHIFT;
> +        rmrrn->scope.devices_cnt = rmrru[i].dev_count;
> +
> +        if ( register_one_rmrr(rmrrn) )
> +            printk(XENLOG_ERR VTDPREFIX
> +                   "Could not register RMMR range  [%"PRIx64"
> - %"PRIx64"]\n",
> +                   rmrru[i].base_pfn, rmrru[i].end_pfn);
> +    }
> +}
> +
>  #include <asm/tboot.h>
>  /* ACPI tables may not be DMA protected by tboot, so use DMAR copy */
>  /* SINIT saved in SinitMleData in TXT heap (which is DMA protected) */
> @@ -875,6 +975,7 @@ int __init acpi_dmar_init(void)
>  {
>      acpi_physical_address dmar_addr;
>      acpi_native_uint dmar_len;
> +    int ret;
> 
>      if ( ACPI_SUCCESS(acpi_get_table_phys(ACPI_SIG_DMAR, 0,
>                                            &dmar_addr,
> &dmar_len)) )
> @@ -885,7 +986,9 @@ int __init acpi_dmar_init(void)
>          dmar_table = __va(dmar_addr);
>      }
> 
> -    return parse_dmar_table(acpi_parse_dmar);
> +    ret = parse_dmar_table(acpi_parse_dmar);
> +    add_extra_rmrr();
> +    return ret;
>  }
> 
>  void acpi_dmar_reinstate(void)
> @@ -916,3 +1019,62 @@ int platform_supports_x2apic(void)
>      unsigned int mask = ACPI_DMAR_INTR_REMAP |
> ACPI_DMAR_X2APIC_OPT_OUT;
>      return cpu_has_x2apic && ((dmar_flags & mask) ==
> ACPI_DMAR_INTR_REMAP);
>  }
> +
> +/*
> + * Parse rmrr Xen command line options and add parsed
> + * device and region into apci_rmrr_unit list to mapped
> + * as RMRRs parsed from ACPI.
> + * Format
> rmrr=start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]
> + * If the segement of the first device is not specified, the segment zero 
> will be
> used.
> + * If other segments are not specified, first device segment will be used.
> + * If segments are specified for every device and not equal, an error will be
> reported.
> + */
> +
> +static void __init parse_rmrr_param(const char *str)
> +{
> +    const char *s = str, *cur, *stmp;
> +    unsigned int seg, bus, dev, func;
> +    unsigned long start, end;
> +
> +    do {
> +        bool_t default_segment = 0;
> +        start = simple_strtoul(cur = s, &s, 0);
> +        if ( cur == s )
> +            break;
> +
> +        if ( *s == '-' )
> +        {
> +            end = simple_strtoul(cur = s + 1, &s, 0);
> +            if ( cur == s )
> +                break;
> +        }
> +        else
> +            end = start;
> +
> +        rmrru[nr_rmrr].base_pfn = start;
> +        rmrru[nr_rmrr].end_pfn = end;
> +        rmrru[nr_rmrr].dev_count = 0;
> +
> +        if ( *s != '=' )
> +            continue;
> +
> +        do {
> +            if ( *s == ';' )
> +                break;
> +            stmp = __parse_pci(s + 1, &seg, &bus, &dev, &func,
> &default_segment);
> +            if ( !stmp )
> +                break;
> +            /* Not specified segment will be replaced with one from first
> device. */
> +            if ( rmrru[nr_rmrr].dev_count > 0 && default_segment )
> +                    seg = PCI_SEG(rmrru[nr_rmrr].sbdf[0]);
> +            /* Keep sbdf's even if they differ and later report error. */
> +            rmrru[nr_rmrr].sbdf[rmrru[nr_rmrr].dev_count] =
> PCI_SBDF(seg, bus, dev, func);
> +            rmrru[nr_rmrr].dev_count++;
> +            s = stmp;
> +        } while ( (*s == ',' || *s || !s) && rmrru[nr_rmrr].dev_count <
> MAX_EXTRA_RMRR_DEV );
> +
> +        if ( rmrru[nr_rmrr].dev_count )
> +            nr_rmrr++;
> +    } while ( *s++ == ';' && nr_rmrr < MAX_EXTRA_RMRR );
> +}
> +custom_param("rmrr", parse_rmrr_param);
> --
> 2.1.3


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.