[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring



> -----Original Message-----
> From: Roger Pau Monne
> Sent: 20 March 2018 08:51
> To: Alexey G <x1917x@xxxxxxxxx>
> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Andrew Cooper
> <Andrew.Cooper3@xxxxxxxxxx>; Ian Jackson <Ian.Jackson@xxxxxxxxxx>; Jan
> Beulich <jbeulich@xxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>; Paul Durrant
> <Paul.Durrant@xxxxxxxxxx>
> Subject: Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG
> area in the MMIO hole + minor code refactoring
> 
> On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote:
> > On Mon, 19 Mar 2018 15:58:02 +0000
> > Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
> >
> > >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko wrote:
> > >> Much like normal PCI BARs or other chipset-specific memory-mapped
> > >> resources, MMCONFIG area needs space in MMIO hole, so we must
> > >> allocate it manually.
> > >>
> > >> The actual MMCONFIG size depends on a number of PCI buses available
> > >> which should be covered by ECAM. Possible options are 64MB, 128MB
> > >> and 256MB. As we are limited to the bus 0 currently, thus using
> > >> lowest possible setting (64MB), #defined via PCI_MAX_MCFG_BUSES in
> > >> hvmloader/config.h. When multiple PCI buses support for Xen will be
> > >> implemented, PCI_MAX_MCFG_BUSES may be changed to calculation
> of the
> > >> number of buses according to results of the PCI devices enumeration.
> > >>
> > >> The way to allocate MMCONFIG range in MMIO hole is similar to how
> > >> other PCI BARs are allocated. The patch extends 'bars' structure to
> > >> make it universal for any arbitrary BAR type -- either IO, MMIO, ROM
> > >> or a chipset-specific resource.
> > >
> > >I'm not sure this is fully correct. The IOREQ interface can
> > >differentiate PCI devices and forward config space accesses to
> > >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change
> you
> > >will forward all MCFG accesses to QEMU, which will likely be wrong if
> > >there are multiple PCI-device emulators for the same domain.
> > >
> > >Ie: AFAICT Xen needs to know about the MCFG emulation and detect
> > >accesses to it in order to forward them to the right emulators.
> > >
> > >Adding Paul who knows more about all this.
> >
> > In which use cases multiple PCI-device emulators are used for a single
> > HVM domain? Is it a proprietary setup?
> 
> Likely. I think XenGT might be using it. It's a feature of the IOREQ
> implementation in Xen.
> 

Multiple ioreq servers are a supported use-case for Xen, if only experimental 
at this point. And indeed xengt is one such use-case.

> Traditional PCI config space accesses are not IO port space accesses.
> The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ
> servers can register devices they would like to receive configuration
> space accesses for. QEMU is already making use of this, see for
> example xen_map_pcidev in the QEMU code.
> 
> By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI
> layer, and thus a IOREQ server could register a PCI device and only
> receive PCI configuration accesses from the IO port space, while MCFG
> accesses would be forwarded somewhere else.
> 
> I think you need to make the IOREQ code aware of the MCFG area and
> XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG
> accesses
> to the right IOREQ server.

Yes, Xen must intercept all accesses to PCI config space and route them 
accordingly.

> 
> > I assume it is somehow related to this code in xen-hvm.c:
> >                 /* Fake a write to port 0xCF8 so that
> >                  * the config space access will target the
> >                  * correct device model.
> >                  */
> >                 val = (1u << 31) | ((req->addr & 0x0f00) <...>
> >                 do_outp(0xcf8, 4, val);
> > if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to
> > the emulated MMCONFIG if needed.
> 
> I have to admit I don't know that much about QEMU, and I have no idea
> what the chunk above is supposed to accomplish.
> 

The easiest way to make QEMU behave appropriately when dealing with a config 
space ioreq was indeed to make it appear as a write to cf8 followed by a read 
or write to cfc.

> >
> > In HVM+QEMU case we are not limited to merely passed through devices,
> > most of the observable PCI config space devices belong to one particular
> > QEMU instance. This dictates the overall emulated MMCONFIG layout
> > for a domain which should be in sync to what QEMU emulates via
> CF8h/CFCh
> > accesses... and between multiple device model instances (if there are
> > any, still not sure what multiple PCI-device emulators you mentioned
> > really are).
> 
> In newer versions of Xen (>4.5 IIRC, Paul knows more), QEMU doesn't
> directly trap accesses to the 0xcf8/0xcfc IO ports, it's Xen instead
> the one that detects and decodes such accesses, and then forwards them
> to the IOREQ server that has been registered to handle them.
> 

Correct.

> You cannot simply forward all MCFG accesses to QEMU as MMIO accesses,
> Xen needs to decode them and they need to be handled as
> IOREQ_TYPE_PCI_CONFIG requests, not IOREQ_TYPE_COPY IMO.
> 
> >
> > Basically, we have an emulated MMCONFIG area of 64/128/256MB size in
> > the MMIO hole of the guest HVM domain. (BTW, this area itself can be
> > considered a feature of the chipset the device model emulates.)
> > It can be relocated to some other place in MMIO hole, this means that
> > QEMU will trap accesses to the specific to the emulated chipset
> > PCIEXBAR register and will issue same MMIO unmap/map calls as for
> > any normal emulated MMIO range.
> >
> > On the other hand, it won't be easy to provide emulated MMCONFIG
> > translation into IOREQ_TYPE_PCI_CONFIG from Xen side. Xen should know
> > current emulated MMCONFIG area position and size in order to translate
> > (or not) accesses to it into corresponding BDF/reg pair (+whether that
> > area is enabled for decoding or not). This will likely require to
> > introduce new hypercall(s).
> 
> Yes, you will have to introduce new hypercalls to tell Xen the
> position/size of the MCFG hole. Likely you want to tell it the start
> address, the pci segment, start bus and end bus. I know pci segment
> and start bus is always going to be 0 ATM, but it would be nice to
> have a complete interface.
> 
> By your comment above I think you want an interface that allows you to
> remove/add those MCFG areas at runtime.
> 

We're going to want hotplug eventually so, yes, devices need to appear and 
disappear dynamically.

> > The question is if there will be any difference or benefit at all.
> 
> IMO it's not about benefits or differences, it's about correctness.
> Xen currently detects accesses to the PCI configuration space from IO
> ports and for consistency it should also detect accesses to this space
> by any other means.
> 

Yes, this is a 'must' rather than a 'should' though.

> > It's basically the same emulated MMIO range after all, but in one case
> > we trap accesses to it in Xen and translate them into
> > IOREQ_TYPE_PCI_CONFIG requests.
> > We have to provide some infrastructure to let Xen know where the device
> > model/guest expects to use the MMCONFIG area (and its size). The
> > device model will need to use this infrastructure, informing Xen of
> > any changes. Also, due to MMCONFIG nature there might be some pitfalls
> > like a necessity to send multiple IOREQ_TYPE_PCI_CONFIG ioreqs caused
> by
> > a single memory read/write operation.
> 
> This seems all fine. Why do you expect MCFG access to create multiple
> IOREQ_TYPE_PCI_CONFIG but not multiple IOREQ_TYPE_COPY?
> 
> > In another case, we still have an emulated MMIO range, but Xen will send
> > plain IOREQ_TYPE_COPY requests to QEMU which it handles itself.
> > In such case, all code to work with MMCONFIG accesses is available for
> > reuse right away (mmcfg -> pci_* translation in QEMU), no new
> > functionality required neither in Xen or QEMU.
> 
> As I tried to argument above, I think this is not correct, but I would
> also like that Paul expresses his opinion as the IOREQ maintainer.

Xen should handle MMCONFIG accesses. All PCI device emulators should register 
for PCI config space by SBDF and the mechanism by which the Xen intercepts the 
config access and routes it to the emulator should be none of the emulators 
concern. QEMU does not own the PCI bus topology; Xen does, and it's been this 
way for quite some time (even if the implementation is incomplete).

  Paul

> 
> > >>  tools/firmware/hvmloader/config.h   |   4 ++
> > >>  tools/firmware/hvmloader/pci.c      | 127
> > >> ++++++++++++++++++++++++++++--------
> > >> tools/firmware/hvmloader/pci_regs.h |   2 + 3 files changed, 106
> > >> insertions(+), 27 deletions(-)
> > >>
> > >> diff --git a/tools/firmware/hvmloader/config.h
> > >> b/tools/firmware/hvmloader/config.h index 6fde6b7b60..5443ecd804
> > >> 100644 --- a/tools/firmware/hvmloader/config.h
> > >> +++ b/tools/firmware/hvmloader/config.h
> > >> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version;
> > >>  #define PCI_ISA_DEVFN       0x08    /* dev 1, fn 0 */
> > >>  #define PCI_ISA_IRQ_MASK    0x0c20U /* ISA IRQs 5,10,11 are PCI
> > >> connected */ #define PCI_ICH9_LPC_DEVFN  0xf8    /* dev 31, fn 0 */
> > >> +#define PCI_MCH_DEVFN       0       /* bus 0, dev 0, func 0 */
> > >>
> > >>  /* MMIO hole: Hardcoded defaults, which can be dynamically
> > >> expanded. */ #define PCI_MEM_END         0xfc000000
> > >>
> > >> +/* possible values are: 64, 128, 256 */
> > >> +#define PCI_MAX_MCFG_BUSES  64
> > >
> > >What the reasoning for this value? Do we know which devices need ECAM
> > >areas?
> >
> > Yes, Xen is limited to bus 0 emulation currently, the description
> > states "When multiple PCI buses support for Xen will be implemented,
> > PCI_MAX_MCFG_BUSES may be changed to calculation of the number of
> buses
> > according to results of the PCI devices enumeration".
> >
> > I think it might be better to replace 'switch (PCI_MAX_MCFG_BUSES)'
> > with the real code right away, i.e. change it to
> >
> > 'switch (max_bus_num, aligned up to 64/128/256 boundary)',
> > where max_bus_num should be set in PCI device enumeration code in
> > pci_setup(). As we are limited to bus 0 currently, we'll just set it
> > to 0 for now, before/after the PCI device enumeration loop (which should
> > became multi-bus capable eventually).
> 
> I guess this is all pretty much hardcoded to bus 0 in several places,
> so I'm not sure it's worth to add PCI_MAX_MCFG_BUSES. IMO if something
> like this should be added it should be PCI_MAX_BUSES, and several
> places should be changed to make use of it. Or ideally we should find
> a way to detect this at runtime, without needed any hardcoded defines.
> 
> I think it would be good if you can add a note comment describing the
> different MCFG sizes supported by the Q35 chipset (64/128/256).
> 
> Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.