Xen project Mailing List

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

From: Alexey G <x1917x@xxxxxxxxx>

Date: Wed, 21 Mar 2018 10:58:40 +1000

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Paul Durrant <paul.durrant@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 21 Mar 2018 00:59:00 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, 20 Mar 2018 08:50:48 +0000 Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: >On Tue, Mar 20, 2018 at 05:49:22AM +1000, Alexey G wrote: >> On Mon, 19 Mar 2018 15:58:02 +0000 >> Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: >> >> >On Tue, Mar 13, 2018 at 04:33:52AM +1000, Alexey Gerasimenko >> >wrote: >> >> Much like normal PCI BARs or other chipset-specific memory-mapped >> >> resources, MMCONFIG area needs space in MMIO hole, so we must >> >> allocate it manually. >> >> >> >> The actual MMCONFIG size depends on a number of PCI buses >> >> available which should be covered by ECAM. Possible options are >> >> 64MB, 128MB and 256MB. As we are limited to the bus 0 currently, >> >> thus using lowest possible setting (64MB), #defined via >> >> PCI_MAX_MCFG_BUSES in hvmloader/config.h. When multiple PCI buses >> >> support for Xen will be implemented, PCI_MAX_MCFG_BUSES may be >> >> changed to calculation of the number of buses according to >> >> results of the PCI devices enumeration. >> >> >> >> The way to allocate MMCONFIG range in MMIO hole is similar to how >> >> other PCI BARs are allocated. The patch extends 'bars' structure >> >> to make it universal for any arbitrary BAR type -- either IO, >> >> MMIO, ROM or a chipset-specific resource. >> > >> >I'm not sure this is fully correct. The IOREQ interface can >> >differentiate PCI devices and forward config space accesses to >> >different emulators (see IOREQ_TYPE_PCI_CONFIG). With this change >> >you will forward all MCFG accesses to QEMU, which will likely be >> >wrong if there are multiple PCI-device emulators for the same >> >domain. >> > >> >Ie: AFAICT Xen needs to know about the MCFG emulation and detect >> >accesses to it in order to forward them to the right emulators. >> > >> >Adding Paul who knows more about all this. >> >> In which use cases multiple PCI-device emulators are used for a >> single HVM domain? Is it a proprietary setup? > >Likely. I think XenGT might be using it. It's a feature of the IOREQ >implementation in Xen. According to public slides for the feature, both PCI conf and MMIO accesses can be routed to the designated device model. It looks like for this particular setup it doesn't really matter which particular ioreq type must be used for MMCONFIG accesses -- either IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should be acceptable. The only thing which matters is ioreq routing itself -- making decisions to which device model the PCI conf/MMIO ioreq should be sent. >Traditional PCI config space accesses are not IO port space accesses. (assuming 'not' mistyped here) >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and IOREQ >servers can register devices they would like to receive configuration >space accesses for. QEMU is already making use of this, see for That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG implementation is a bit inconvenient for MMCONFIG MMIO accesses -- it's too much CF8h/CFCh-centric in its implementation, might be painful to change something in the code which was intended for CF8h/CFCh handling (and not for MMIO processing). >example xen_map_pcidev in the QEMU code. > >By treating MCFG accesses as MMIO you are bypassing the IOREQ PCI >layer, and thus a IOREQ server could register a PCI device and only >receive PCI configuration accesses from the IO port space, while MCFG >accesses would be forwarded somewhere else. It will be handled by IOREQ too, just using a different IOREQ type (MMIO one). The basic question is why do we have to stick to PCI conf space ioreqs for emulating MMIO accesses to MMCONFIG. >I think you need to make the IOREQ code aware of the MCFG area and >XEN_DMOP_IO_RANGE_PCI needs to forward both IO space and MCFG accesses >to the right IOREQ server. Right now there is no way to inform Xen where the emulated MMCONFIG area is located in order to make this decision, based on the address within MMCONFIG range. A new dmop/hypercall is needed (with args similar to pci_mmcfg_reserved) along with its usage in QEMU. I'll try to summarize two different approaches to MMCONFIG handling. For both approaches the final PCI config host interface for a passed through device in QEMU will remain same as at the moment -- xen_host_pci_* functions in /hw/xen. Approach #1. Informing Xen about MMCONFIG area changes and letting Xen to translate MMIO accesses to _PCI_CONFIG ioreqs: 1. QEMU will trap accesses to PCIEXBAR, calling Xen via dmop/hypercall to let the latter know of any MMCONFIG area address/size/status changes 2. Xen will trap MMIO accesses to the current MMCONFIG location and convert memory accesses into one or several _PCI_CONFIG ioreqs and send them to a chosen device model 3. QEMU will receive _PCI_CONFIG ioreqs with SBDF and 12-bit offsets inside which it needs to somehow pass to pci_host_config_{read,write}_common() for emulation. It might require few hacks to make the gears turn (due to QEMU pci conf read/write model). At the moment emulated CF8h/CFCh ports play a special role in all this -- xen-hvm.c writes an AMD-style value to the emulated CF8h port "so that the config space access will target the correct device model" (quote). Not sure about this and why it's is needed if Xen actually makes the decision to which DM the PCI conf ioreq should be sent. One minor note: these new 'set_mmconfig_' dmops/hypercalls have to be triggered inside the chipset-specific emulation code in QEMU (PCIEXBAR handling in Q35 case). If there will be another machine which needs to emulate MMCONFIG control differently -- we have no choice but to insert these dmops/hypercalls into another chipset-specific emulation code as well, eg. inside HECBASE emulation code. Approach #2. Handling MMCONFIG area inside QEMU using usual MMIO emulation: 1. QEMU will trap accesses to PCIEXBAR (or whatever else possibly supported in the future like HECBASE), eventually asking Xen to map the MMCONFIG MMIO range for ioreq servicing just like it does for any other emulated MMIO range, via map_io_range_to_ioreq_server(). All changes in MMCONFIG placement/status will lead to remapping/unmapping the MMIO range. 2. Xen will trap MMIO accesses to this area and forward them to QEMU as MMIO (IOREQ_TYPE_COPY) ioreqs 3. QEMU will receive these accesses and pass them to the existing MMCONFIG emulation -- pcie_mmcfg_data_read/write handlers, finally resulting in same xen_host_pci_* function calls as before. This approach works "right out of the box", no changes needed for either Xen or QEMU. As both _PCI_CONFIG and MMIO type ioreqs are processed, either method can be used to access PCI/extended config space -- CF8/CFC port I/O or MMIO accesses to MMCONFIG. IOREQ routing for multiple device emulators can be supported too. In fact, the same mmconfig dmops/hypercalls can be added to let Xen know where MMCONFIG area resides, Xen will use this information to forward MMCONFIG MMIO ioreqs accordingly to BDF of the address. The difference with the approach #1 is that these interfaces are now completely optional when we use MMIO ioreqs for MMCONFIG on vanilla Xen/QEMU. The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG translation is a must have thing at all? It won't make handling simpler. For current QEMU implementation IOREQ_TYPE_COPY (MMIO accesses for MMCONFIG) would be preferable as it allows to use the existing code. I think it will be safe to use MMCONFIG emulation on MMIO level for now and later extend it with 'set_mmconfig_' dmop/hypercall for the 'multiple device emulators' IOREQ_TYPE_COPY routing to work same as for PCI conf, so it can be used by XenGT etc on Q35 as well. After all, all this is Q35-specific and won't harm the existing i440 emulation in any way. >> I assume it is somehow related to this code in xen-hvm.c: >> /* Fake a write to port 0xCF8 so that >> * the config space access will target the >> * correct device model. >> */ >> val = (1u << 31) | ((req->addr & 0x0f00) <...> >> do_outp(0xcf8, 4, val); >> if yes, similar thing can be made for IOREQ_TYPE_COPY accesses to >> the emulated MMCONFIG if needed. > >I have to admit I don't know that much about QEMU, and I have no idea >what the chunk above is supposed to accomplish. > >> >> In HVM+QEMU case we are not limited to merely passed through devices, >> most of the observable PCI config space devices belong to one >> particular QEMU instance. This dictates the overall emulated >> MMCONFIG layout for a domain which should be in sync to what QEMU >> emulates via CF8h/CFCh accesses... and between multiple device model >> instances (if there are any, still not sure what multiple PCI-device >> emulators you mentioned really are). > >In newer versions of Xen (>4.5 IIRC, Paul knows more), QEMU doesn't >directly trap accesses to the 0xcf8/0xcfc IO ports, it's Xen instead >the one that detects and decodes such accesses, and then forwards them >to the IOREQ server that has been registered to handle them. > >You cannot simply forward all MCFG accesses to QEMU as MMIO accesses, >Xen needs to decode them and they need to be handled as >IOREQ_TYPE_PCI_CONFIG requests, not IOREQ_TYPE_COPY IMO. > >> >> Basically, we have an emulated MMCONFIG area of 64/128/256MB size in >> the MMIO hole of the guest HVM domain. (BTW, this area itself can be >> considered a feature of the chipset the device model emulates.) >> It can be relocated to some other place in MMIO hole, this means that >> QEMU will trap accesses to the specific to the emulated chipset >> PCIEXBAR register and will issue same MMIO unmap/map calls as for >> any normal emulated MMIO range. >> >> On the other hand, it won't be easy to provide emulated MMCONFIG >> translation into IOREQ_TYPE_PCI_CONFIG from Xen side. Xen should know >> current emulated MMCONFIG area position and size in order to >> translate (or not) accesses to it into corresponding BDF/reg pair >> (+whether that area is enabled for decoding or not). This will >> likely require to introduce new hypercall(s). > >Yes, you will have to introduce new hypercalls to tell Xen the >position/size of the MCFG hole. Likely you want to tell it the start >address, the pci segment, start bus and end bus. I know pci segment >and start bus is always going to be 0 ATM, but it would be nice to >have a complete interface. > >By your comment above I think you want an interface that allows you to >remove/add those MCFG areas at runtime. > >> The question is if there will be any difference or benefit at all. > >IMO it's not about benefits or differences, it's about correctness. >Xen currently detects accesses to the PCI configuration space from IO >ports and for consistency it should also detect accesses to this space >by any other means. > >> It's basically the same emulated MMIO range after all, but in one >> case we trap accesses to it in Xen and translate them into >> IOREQ_TYPE_PCI_CONFIG requests. >> We have to provide some infrastructure to let Xen know where the >> device model/guest expects to use the MMCONFIG area (and its size). >> The device model will need to use this infrastructure, informing Xen >> of any changes. Also, due to MMCONFIG nature there might be some >> pitfalls like a necessity to send multiple IOREQ_TYPE_PCI_CONFIG >> ioreqs caused by a single memory read/write operation. > >This seems all fine. Why do you expect MCFG access to create multiple >IOREQ_TYPE_PCI_CONFIG but not multiple IOREQ_TYPE_COPY? >> In another case, we still have an emulated MMIO range, but Xen will >> send plain IOREQ_TYPE_COPY requests to QEMU which it handles itself. >> In such case, all code to work with MMCONFIG accesses is available >> for reuse right away (mmcfg -> pci_* translation in QEMU), no new >> functionality required neither in Xen or QEMU. > >As I tried to argument above, I think this is not correct, but I would >also like that Paul expresses his opinion as the IOREQ maintainer. > >> >> tools/firmware/hvmloader/config.h | 4 ++ >> >> tools/firmware/hvmloader/pci.c | 127 >> >> ++++++++++++++++++++++++++++-------- >> >> tools/firmware/hvmloader/pci_regs.h | 2 + 3 files changed, 106 >> >> insertions(+), 27 deletions(-) >> >> >> >> diff --git a/tools/firmware/hvmloader/config.h >> >> b/tools/firmware/hvmloader/config.h index 6fde6b7b60..5443ecd804 >> >> 100644 --- a/tools/firmware/hvmloader/config.h >> >> +++ b/tools/firmware/hvmloader/config.h >> >> @@ -53,10 +53,14 @@ extern uint8_t ioapic_version; >> >> #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */ >> >> #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI >> >> connected */ #define PCI_ICH9_LPC_DEVFN 0xf8 /* dev 31, fn 0 >> >> */ +#define PCI_MCH_DEVFN 0 /* bus 0, dev 0, func 0 */ >> >> >> >> /* MMIO hole: Hardcoded defaults, which can be dynamically >> >> expanded. */ #define PCI_MEM_END 0xfc000000 >> >> >> >> +/* possible values are: 64, 128, 256 */ >> >> +#define PCI_MAX_MCFG_BUSES 64 >> > >> >What the reasoning for this value? Do we know which devices need >> >ECAM areas? >> >> Yes, Xen is limited to bus 0 emulation currently, the description >> states "When multiple PCI buses support for Xen will be implemented, >> PCI_MAX_MCFG_BUSES may be changed to calculation of the number of >> buses according to results of the PCI devices enumeration". >> >> I think it might be better to replace 'switch (PCI_MAX_MCFG_BUSES)' >> with the real code right away, i.e. change it to >> >> 'switch (max_bus_num, aligned up to 64/128/256 boundary)', >> where max_bus_num should be set in PCI device enumeration code in >> pci_setup(). As we are limited to bus 0 currently, we'll just set it >> to 0 for now, before/after the PCI device enumeration loop (which >> should became multi-bus capable eventually). > >I guess this is all pretty much hardcoded to bus 0 in several places, >so I'm not sure it's worth to add PCI_MAX_MCFG_BUSES. IMO if something >like this should be added it should be PCI_MAX_BUSES, and several >places should be changed to make use of it. Or ideally we should find >a way to detect this at runtime, without needed any hardcoded defines. Getting rid of bus 0 limitation should have high priority I'm afraid. It has become an obstacle for PCIe passthrough. >I think it would be good if you can add a note comment describing the >different MCFG sizes supported by the Q35 chipset (64/128/256). will add "...supported by Q35" here: >> +/* possible values are: 64, 128, 256 */ >> +#define PCI_MAX_MCFG_BUSES 64 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.