[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring



On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
> Much like normal PCI BARs or other chipset-specific memory-mapped
> resources, MMCONFIG area needs space in MMIO hole, so we must allocate
> it manually.
> 
> The actual MMCONFIG size depends on a number of PCI buses available which
> should be covered by ECAM. Possible options are 64MB, 128MB and 256MB.
> As we are limited to the bus 0 currently, thus using lowest possible
> setting (64MB), #defined via PCI_MAX_MCFG_BUSES in hvmloader/config.h.
> When multiple PCI buses support for Xen will be implemented,
> PCI_MAX_MCFG_BUSES may be changed to calculation of the number of buses
> according to results of the PCI devices enumeration.
> 
> The way to allocate MMCONFIG range in MMIO hole is similar to how other
> PCI BARs are allocated. The patch extends 'bars' structure to make
> it universal for any arbitrary BAR type -- either IO, MMIO, ROM or
> a chipset-specific resource.

I'm not sure this is fully correct. The IOREQ interface can
differentiate PCI devices and forward config space accesses to
different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change you
will forward all MCFG accesses to QEMU, which will likely be wrong if
there are multiple PCI-device emulators for the same domain.

Ie: AFAICT Xen needs to know about the MCFG emulation and detect
accesses to it in order to forward them to the right emulators.

Adding Paul who knows more about all this.

> One important new field is addr_mask, which tells which bits of the base
> address can (should) be written. Different address types (ROM, MMIO BAR,
> PCIEXBAR) will have different addr_mask values.
> 
> For every assignable BAR range we store its size, PCI device BDF (devfn
> actually) to which it belongs, BAR type (mem/io/mem64) and corresponding
> register offset in device PCI conf space. This way we can insert MMCONFIG
> entry into bars array in the same manner like for any other BARs. In this
> case, the devfn field will point to MCH PCI device and bar_reg will
> contain PCIEXBAR register offset. It will be assigned a slot in MMIO hole
> later in a very same way like for plain PCI BARs, with respect to its size
> alignment.
> 
> Also, to reduce code complexity, all long mem/mem64 BAR flags checks are
> replaced by simple bars[i] field probing, eg.:
> -        if ( (bar_reg == PCI_ROM_ADDRESS) ||
> -             ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> -              PCI_BASE_ADDRESS_SPACE_MEMORY) )
> +        if ( bars[i].is_mem )

This should be a separate change IMO.

> 
> Signed-off-by: Alexey Gerasimenko <x1917x@xxxxxxxxx>
> ---
>  tools/firmware/hvmloader/config.h   |   4 ++
>  tools/firmware/hvmloader/pci.c      | 127 
> ++++++++++++++++++++++++++++--------
>  tools/firmware/hvmloader/pci_regs.h |   2 +
>  3 files changed, 106 insertions(+), 27 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/config.h 
> b/tools/firmware/hvmloader/config.h
> index 6fde6b7b60..5443ecd804 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
>  #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
>  
>  /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
>  #define PCI_MEM_END         0xfc000000
>  
> +/* possible values are: 64, 128, 256 */
> +#define PCI_MAX_MCFG_BUSES  64

What the reasoning for this value? Do we know which devices need ECAM
areas?

> +
>  #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
>  
>  extern unsigned long pci_mem_start, pci_mem_end;
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 033bd20992..6de124bbd5 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -158,9 +158,10 @@ static void class_specific_pci_device_setup(uint16_t 
> vendor_id,
>  
>  void pci_setup(void)
>  {
> -    uint8_t is_64bar, using_64bar, bar64_relocate = 0;
> +    uint8_t is_64bar, using_64bar, bar64_relocate = 0, is_mem;
>      uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
>      uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
> +    uint64_t addr_mask;
>      uint16_t vendor_id, device_id;
>      unsigned int bar, pin, link, isa_irq;
>      int is_running_on_q35 = 0;
> @@ -172,10 +173,14 @@ void pci_setup(void)
>  
>      /* Create a list of device BARs in descending order of size. */
>      struct bars {
> -        uint32_t is_64bar;
>          uint32_t devfn;
>          uint32_t bar_reg;
>          uint64_t bar_sz;
> +        uint64_t addr_mask; /* which bits of the base address can be written 
> */
> +        uint32_t bar_data;  /* initial value - BAR flags here */
> +        uint8_t  is_64bar;
> +        uint8_t  is_mem;
> +        uint8_t  padding[2];

Why are you manually adding a padding here? Also why not make this
fields bool?

>      } *bars = (struct bars *)scratch_start;
>      unsigned int i, nr_bars = 0;
>      uint64_t mmio_hole_size = 0;
> @@ -259,13 +264,21 @@ void pci_setup(void)
>                  bar_reg = PCI_ROM_ADDRESS;
>  
>              bar_data = pci_readl(devfn, bar_reg);
> +
> +            is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> +                       PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> +                       (bar_reg == PCI_ROM_ADDRESS));
> +
>              if ( bar_reg != PCI_ROM_ADDRESS )
>              {
> -                is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
> -                             PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
> -                             (PCI_BASE_ADDRESS_SPACE_MEMORY |
> +                is_64bar = !!(is_mem &&
> +                             ((bar_data & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
>                               PCI_BASE_ADDRESS_MEM_TYPE_64));
> +
>                  pci_writel(devfn, bar_reg, ~0);
> +
> +                addr_mask = is_mem ? PCI_BASE_ADDRESS_MEM_MASK
> +                                   : PCI_BASE_ADDRESS_IO_MASK;
>              }
>              else
>              {
> @@ -273,28 +286,35 @@ void pci_setup(void)
>                  pci_writel(devfn, bar_reg,
>                             (bar_data | PCI_ROM_ADDRESS_MASK) &
>                             ~PCI_ROM_ADDRESS_ENABLE);
> +
> +                addr_mask = PCI_ROM_ADDRESS_MASK;
>              }
> +
>              bar_sz = pci_readl(devfn, bar_reg);
>              pci_writel(devfn, bar_reg, bar_data);
>  
>              if ( bar_reg != PCI_ROM_ADDRESS )
> -                bar_sz &= (((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> -                            PCI_BASE_ADDRESS_SPACE_MEMORY) ?
> -                           PCI_BASE_ADDRESS_MEM_MASK :
> -                           (PCI_BASE_ADDRESS_IO_MASK & 0xffff));
> +                bar_sz &= is_mem ? PCI_BASE_ADDRESS_MEM_MASK :
> +                                   (PCI_BASE_ADDRESS_IO_MASK & 0xffff);
>              else
>                  bar_sz &= PCI_ROM_ADDRESS_MASK;
> -            if (is_64bar) {
> +
> +            if (is_64bar)

Coding style (spaces between parentheses).

> +            {
>                  bar_data_upper = pci_readl(devfn, bar_reg + 4);
>                  pci_writel(devfn, bar_reg + 4, ~0);
>                  bar_sz_upper = pci_readl(devfn, bar_reg + 4);
>                  pci_writel(devfn, bar_reg + 4, bar_data_upper);
>                  bar_sz = (bar_sz_upper << 32) | bar_sz;
>              }
> +
>              bar_sz &= ~(bar_sz - 1);
>              if ( bar_sz == 0 )
>                  continue;
>  
> +            /* leave only memtype/enable bits etc */
> +            bar_data &= ~addr_mask;
> +
>              for ( i = 0; i < nr_bars; i++ )
>                  if ( bars[i].bar_sz < bar_sz )
>                      break;
> @@ -302,14 +322,15 @@ void pci_setup(void)
>              if ( i != nr_bars )
>                  memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
>  
> -            bars[i].is_64bar = is_64bar;
> -            bars[i].devfn   = devfn;
> -            bars[i].bar_reg = bar_reg;
> -            bars[i].bar_sz  = bar_sz;
> +            bars[i].is_64bar  = is_64bar;
> +            bars[i].is_mem    = is_mem;
> +            bars[i].devfn     = devfn;
> +            bars[i].bar_reg   = bar_reg;
> +            bars[i].bar_sz    = bar_sz;
> +            bars[i].addr_mask = addr_mask;
> +            bars[i].bar_data  = bar_data;
>  
> -            if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> -                  PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> -                 (bar_reg == PCI_ROM_ADDRESS) )
> +            if ( is_mem )
>                  mmio_total += bar_sz;
>  
>              nr_bars++;
> @@ -339,6 +360,63 @@ void pci_setup(void)
>          pci_writew(devfn, PCI_COMMAND, cmd);
>      }
>  
> +    /*
> +     *  Calculate MMCONFIG area size and squeeze it into the bars array
> +     *  for assigning a slot in the MMIO hole
> +     */
> +    if (is_running_on_q35)
> +    {
> +        /* disable PCIEXBAR decoding for now */
> +        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
> +        pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);

I'm afraid I will need some context here, where is the description for
the config space of dev 0 fn 0? I don't seem to be able to find it in
the ich9 spec.

> +
> +#define PCIEXBAR_64_BUSES    (2 << 1)
> +#define PCIEXBAR_128_BUSES   (1 << 1)
> +#define PCIEXBAR_256_BUSES   (0 << 1)
> +#define PCIEXBAR_ENABLE      (1 << 0)

Why those strange definitions? (0 << 1)? (2 << 1) instead of (1 << 2)?

> +
> +        switch (PCI_MAX_MCFG_BUSES)
> +        {
> +        case 64:
> +            bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
> +            bar_sz = MB(64);
> +            break;
> +
> +        case 128:
> +            bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
> +            bar_sz = MB(128);
> +            break;
> +
> +        case 256:
> +            bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
> +            bar_sz = MB(256);
> +            break;
> +
> +        default:
> +            /* unsupported number of buses specified */
> +            BUG();
> +        }

I don't see how PCI_MAX_MCFG_BUSES should be used. Is the user
supposed to know what value to use at compile time? What about distro
packagers?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.