[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] x86/mmcfg: add "force" option for MCFG



On Wed, Aug 28, 2019 at 04:24:22PM +0100, Igor Druzhinin wrote:
> If MCFG area is not reserved in E820 Xen by default will defer its usage
> until Dom0 registers it explicitly after ACPI parser recognizes it as
> a reserved resource in DSDT. Having it reserved in E820 is not
> mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
> and firmware is free to keep a hole E820 in that place.
> 
> Unfortunately, keeping it disabled at this point makes Xen fail to
> recognize many of PCIe extended capabilities early enough for newly added
> devices. Namely, (1) some of VT-d quirks are not applied during Dom0
> device handoff, (2) currently MCFG reservation report comes
> too late from Dom0 after some of PCI devices being registered in Xen
> by Dom0 kernel that break initialization of a number of PCIe capabilities
> (e.g. SR-IOV VF BAR sizing).
> 
> Since introduction of ACPI parser in Xen is not possible add a "force"
> option that will allow Xen to use MCFG area even it's not properly
> reserved in E820.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>

Thanks! I think this is the best way to solve the issue.

> ---
> 
> I think we need to have this option to at least have a way to workaround
> problem (1). Problem (2) could be solved in Dom0 kernel by invoking
> xen_mcfg_late() earlier but before the first PCI bus enumertaion which
> currently happens somwhere during ACPI scan I think.
> 
> The question is what the defult value for this option should be?

Have you tested this against a variety of hardware?

Have you found any box that has a wrong MMCFG area in the MCFG static
table? (ie: one that doesn't match the motherboard resource
reservation in the dynamic ACPI tables?)

> 
> ---
>  docs/misc/xen-command-line.pandoc     |  8 +++++---
>  xen/arch/x86/e820.c                   | 20 ++++++++++++++++++++
>  xen/arch/x86/x86_64/mmconfig-shared.c | 34 ++++++++++++++++++++--------------
>  xen/include/asm-x86/e820.h            |  1 +
>  4 files changed, 46 insertions(+), 17 deletions(-)
> 
> diff --git a/docs/misc/xen-command-line.pandoc 
> b/docs/misc/xen-command-line.pandoc
> index 7c72e31..f13b10c 100644
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -1424,11 +1424,13 @@ ordinary DomU, control domain, hardware domain, and - 
> when supported
>  by the platform - DomU with pass-through device assigned).
>  
>  ### mmcfg (x86)
> -> `= <boolean>[,amd-fam10]`
> +> `= List of [ <boolean>, force, amd-fam10 ]`
>  
> -> Default: `1`
> +> Default: `true,force`
>  
> -Specify if the MMConfig space should be enabled.
> +Specify if the MMConfig space should be enabled. If force option is specified
> +(default) MMConfig space will be used by Xen early in boot even if it's
> +not reserved by firmware in the E820 table.
>  
>  ### mmio-relax (x86)
>  > `= <boolean> | all`
> diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
> index 8e8a2c4..edef72a 100644
> --- a/xen/arch/x86/e820.c
> +++ b/xen/arch/x86/e820.c
> @@ -37,6 +37,26 @@ struct e820map e820;
>  struct e820map __initdata e820_raw;
>  
>  /*
> + * This function checks if any part of the range <start,end> is mapped
> + * with type.
> + */
> +int __init e820_any_mapped(u64 start, u64 end, unsigned type)

Please use uint64_t and unsigned int. Also it looks like this
function wants to return a bool instead of an int.

> +{
> +       int i;

unsigned int.

> +
> +       for (i = 0; i < e820.nr_map; i++) {

Coding style. Some parts of this file are using the Linux coding
style I think, but new additions should be using the Xen coding
style, see e820_change_range_type for example.

> +               struct e820entry *ei = &e820.map[i];

const.

> +
> +               if (type && ei->type != type)
> +                       continue;
> +               if (ei->addr >= end || ei->addr + ei->size <= start)
> +                       continue;
> +               return 1;
> +       }
> +       return 0;
> +}
> +
> +/*
>   * This function checks if the entire range <start,end> is mapped with type.
>   *
>   * Note: this function only works correct if the e820 table is sorted and
> diff --git a/xen/arch/x86/x86_64/mmconfig-shared.c 
> b/xen/arch/x86/x86_64/mmconfig-shared.c
> index cc08b52..9fc0865 100644
> --- a/xen/arch/x86/x86_64/mmconfig-shared.c
> +++ b/xen/arch/x86/x86_64/mmconfig-shared.c
> @@ -26,33 +26,34 @@
>  
>  #include "mmconfig.h"
>  
> +static bool_t __read_mostly force_mmcfg = true;
>  unsigned int pci_probe = PCI_PROBE_CONF1 | PCI_PROBE_MMCONF;
>  
>  static int __init parse_mmcfg(const char *s)
>  {
>      const char *ss;
> -    int rc = 0;
> +    int val, rc = 0;
>  
>      do {
>          ss = strchr(s, ',');
>          if ( !ss )
>              ss = strchr(s, '\0');
>  
> -        switch ( parse_bool(s, ss) )
> -        {
> -        case 0:
> -            pci_probe &= ~PCI_PROBE_MMCONF;
> -            break;
> -        case 1:
> -            break;
> -        default:
> -            if ( !cmdline_strcmp(s, "amd_fam10") ||
> -                 !cmdline_strcmp(s, "amd-fam10") )
> +        if ( (val = parse_bool(s, ss)) >= 0 ) {
> +            if ( val )
> +               pci_probe |= PCI_PROBE_MMCONF;
> +            else
> +               pci_probe &= ~PCI_PROBE_MMCONF;
> +        } else if ( (val = parse_boolean("amd_fam10", s, ss)) >= 0 ||
> +                    (val = parse_boolean("amd-fam10", s, ss)) >= 0 ) {
> +            if ( val )
>                  pci_probe |= PCI_CHECK_ENABLE_AMD_MMCONF;
>              else
> -                rc = -EINVAL;
> -            break;
> -        }
> +                pci_probe &= ~PCI_CHECK_ENABLE_AMD_MMCONF;
> +        } else if ( (val = parse_boolean("force", s, ss)) >= 0)
> +            force_mmcfg = val;

You could also consider adding a new flag to pci_probe, ie:
PCI_PROBE_FORCE_MMCFG.

> +        else
> +            rc = -EINVAL;
>  
>          s = ss + 1;
>      } while ( *ss );
> @@ -355,6 +356,11 @@ static int __init is_mmconf_reserved(
>                     (unsigned int)cfg->start_bus_number,
>                     (unsigned int)cfg->end_bus_number);
>          }
> +    } else if (!e820_any_mapped(addr, addr + old_size - 1, 0)) {

I think it should be fine to also accept a MMCFG area that's partially
reserved and partially a hole in the memory map?

AFAICT the proposed code will only accept MMCFG regions that are
fully in a memory map hole.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.