[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring



On Wed, 21 Mar 2018 14:54:16 +0000
Paul Durrant <Paul.Durrant@xxxxxxxxxx> wrote:

>> -----Original Message-----
>> From: Alexey G [mailto:x1917x@xxxxxxxxx]
>> Sent: 21 March 2018 14:26
>> To: Roger Pau Monne <roger.pau@xxxxxxxxxx>
>> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx; Andrew Cooper
>> <Andrew.Cooper3@xxxxxxxxxx>; Ian Jackson <Ian.Jackson@xxxxxxxxxx>;
>> Jan Beulich <jbeulich@xxxxxxxx>; Wei Liu <wei.liu2@xxxxxxxxxx>; Paul
>> Durrant <Paul.Durrant@xxxxxxxxxx>; Anthony Perard
>> <anthony.perard@xxxxxxxxxx>; Stefano Stabellini
>> <sstabellini@xxxxxxxxxx> Subject: Re: [Xen-devel] [RFC PATCH 07/12]
>> hvmloader: allocate MMCONFIG area in the MMIO hole + minor code
>> refactoring
>> 
>> On Wed, 21 Mar 2018 09:09:11 +0000
>> Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:
>>   
>> >On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:  
>> [...]  
>> >> According to public slides for the feature, both PCI conf and MMIO
>> >> accesses can be routed to the designated device model. It looks
>> >> like for this particular setup it doesn't really matter which
>> >> particular ioreq type must be used for MMCONFIG accesses -- either
>> >> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should  
>> be  
>> >> acceptable.  
>> >
>> >Isn't that going to be quite messy? How is the IOREQ server supposed
>> >to decode a MCFG access received as IOREQ_TYPE_COPY?  
>> 
>> This code is already available and in sync with QEMU legacy PCI conf
>> emulation infrastructure.
>>   
>> >I don't think the IOREQ server needs to know the start of the MCFG
>> >region, in which case it won't be able to detect and decode the
>> >access if it's of type IOREQ_TYPE_COPY.  
>> 
>> How do you think Xen will be able to know if arbitrary MMIO
>> access targets MMCONFIG area and to which BDF the offset in this area
>> belongs, without knowing where MMCONFIG is located and what PCI bus
>> layout is? It's QEMU who emulate PCIEXBAR and can tell Xen where
>> MMCONFIG is expected to be.
>>   
>> >MCFG accesses need to be sent to the IOREQ server as
>> >IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
>> >know the position of the MCFG area in order to do the decoding. In
>> >your case this would work because QEMU controls the position of the
>> >MCFG region, but there's no need for other IOREQ servers to know the
>> >position of the MCFG area.
>> >  
>> >> The only thing which matters is ioreq routing itself --
>> >> making decisions to which device model the PCI conf/MMIO ioreq
>> >> should be sent.  
>> >
>> >Hm, see above, but I'm fairly sure you need to forward those MCFG
>> >accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.  
>> 
>> (a detailed answer below)
>>   
>> >> >Traditional PCI config space accesses are not IO port space
>> >> >accesses.  
>> >>
>> >> (assuming 'not' mistyped here)  
>> >
>> >Not really, this should instead be:
>> >
>> >"Traditional PCI config space accesses are not forwarded to the
>> >IOREQ server as IO port space accesses (IOREQ_TYPE_PIO) but rather
>> >as PCI config space accesses (IOREQ_TYPE_PCI_CONFIG)."
>> >
>> >Sorry for the confusion.
>> >  
>> >> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and
>> >> >IOREQ servers can register devices they would like to receive
>> >> >configuration space accesses for. QEMU is already making use of
>> >> >this, see for  
>> >>
>> >> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
>> >> implementation is a bit inconvenient for MMCONFIG MMIO accesses --
>> >> it's too much CF8h/CFCh-centric in its implementation, might be
>> >> painful to change something in the code which was intended for
>> >> CF8h/CFCh handling (and not for MMIO processing).  
>> >
>> >I'm not sure I follow. Do you mean that changes should be made to
>> >the ioreq struct in order to forward MCFG accesses using
>> >IOREQ_TYPE_PCI_CONFIG as it's type?  
>> 
>> No changes for ioreq structures needed for now.
>>   
>> >> It will be handled by IOREQ too, just using a different IOREQ type
>> >> (MMIO one). The basic question is why do we have to stick to PCI
>> >> conf space ioreqs for emulating MMIO accesses to MMCONFIG.  
>> >
>> >Because other IOREQ servers don't need to know about the
>> >position/size of the MCFG area, and cannot register MMIO ranges
>> >that cover their device's PCI configuration space in the MCFG
>> >region.
>> >
>> >Not to mention that it would would be a terrible design flaw to
>> >force IOREQ servers to register PCI devices and MCFG areas
>> >belonging to those devices separately as MMIO in order to trap all
>> >possible PCI configuration space accesses.  
>> 
>> PCI conf space layout is shared by the emulated machine. And MMCONFIG
>> layout is mandated by this common PCI bus map.
>> 
>> Even if those 'multiple device models' see a different picture of PCI
>> conf space, their visions of PCI bus must not overlap + MMCONFIG
>> layout must be consistent between different device models.
>> 
>> Although it is a terrible mistake to think about the emulated PCI bus
>> like it's a set of distinct PCI devices unrelated to each other. It's
>> all coupled together. And this is especially true for PCIe.
>> Many PCIe features rely on PCIe device interaction in PCIe fabric,
>> eg. PCIe endpoints may interact with Root Complex in many ways. This
>> cooperation may  need to be emulated somehow, eg. to provide some
>> support for PM features, link management or native hotplug
>> facilities. Even if we have a real passed through device, we might
>> need to provide an emulated PCIe Switch or a Root Port for it to
>> function properly within the PCIe hierarchy.
>> 
>> Dedicating an isolated PCI device to some isolated device model --
>> that's what might be the design flaw, considering the PCIe world.
>>   
>
>I think that is the crux of the problem. The current
>multi-ioreq-server relies on being able to consider PCI devices as
>being isolated from each other... and that is basically fine because
>we only use a single PCI bus with no bridges. To move to PCIe will
>require more emulation in Xen, but I think that is the only way to do
>it properly.

Unfortunately, this approach won't work anymore for PCIe. We can't just
separate PCI bus emulation with the chipset-specific emulation. Even
MMCONFIG is a chipset-specific feature. In order to do this, Xen should
emulate many device model features itself, probably to the point when
QEMU can be safely dropped as DM.

The single bus has become a major issue for passthrough already:
http://lists.gnu.org/archive/html/qemu-devel/2018-03/msg03593.html

I want to replace this workaround with actual multiple bus support,
although now I'm not sure if I can make it through -- there will be a
lot of resistance likely. Although this is a mandatory feature for PCIe
PT.

>> [...]  
>> >
>> >Maybe you could detect offsets >= 256 and replay them in QEMU like
>> >mmio accesses? Using the address_space_write or
>> >pcie_mmcfg_data_read/write functions?
>> >I have to admit my knowledge of QEMU is quite limited, so I'm not
>> >sure of the best way to handle this.
>> >
>> >Ideally we should find a way that doesn't involve having to modify
>> >each chipset to handle MCFG accesses from Xen. It would be nice to
>> >have some kind of interface inside of QEMU so all chipsets can
>> >register MCFG areas or modify them, but this is out of the scope of
>> >this work.  
>> 
>> Roger, Paul,
>> 
>> Here is what you suggest, just to clarify:
>> 
>> 1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
>> Xen where QEMU emulates machine's MMCONFIG (chipset-specific
>> emulation
>> of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely on this
>> information to know to which PCI device the address within MMCONFIG
>> belong.
>> 
>> 2. Xen will trap this area + remap its trapping to other address if
>> QEMU will inform Xen about emulated PCIEXBAR value change
>> 
>> 3. Every MMIO access to the current MMCONFIG range will be converted
>> into BDF first (by offset within this range, knowing where the range
>> is)
>> 
>> 4. Target device model is selected using calculated BDF
>> 
>> 5. MMIO read/write accesses are converted into PCI config space
>> ioreqs (like it was a CF8/CFCh operation instead of MMIO access). At
>> this point ioreq structure allows to specify extended PCI conf offset
>> (12-bit), so it will fit into PCI conf ioreq. For now let's assume
>> that eg. a 64-bit memory operation is either aborted or workarounded
>> by splitting this operation into multiple PCI conf ioreqs.
>> 
>> 6. PCI conf read/write ioreqs are sent to the chosen device model
>> 
>> 7. QEMU receive MMCONFIG memory reads/writes as PCI conf reads/writes
>> 
>> 8. As these MMCONFIG PCI conf reads occur out of context (just
>> address/len/data without any emulated device attached to it),
>> xen-hvm.c should employ special logic to make it QEMU-friendly --
>> eg. right now it sends received PCI conf access into (emulated by
>> QEMU) CF8h/CFCh ports.
>> There is a real problem to embed these "naked" accesses into QEMU
>> infrastructure, workarounds are required. BTW, find_primary_bus() was
>> dropped from QEMU code -- it could've been useful here. Let's assume
>> some workaround is employed (like storing a required object pointers
>> in global variables for later use in xen-hvm.c)
>> 
>> 9. Existing MMCONFIG-handling code in QEMU will be unused in this
>> scenario
>> 
>> 10. All this needed primarily to make the specific "Multiple device
>> emulators" feature to work (XenGT was mentioned as its user) on Q35
>> with MMCONFIG.
>> 
>> Anything wrong/missing here?  
>
>That all sounds plausible. All we essentially need to do is make sure
>the config space transactions make it to the right device model in
>QEMU. If the emulation in Xen is comprehensive then I guess there
>should not even be any reason for QEMU's idea of the bus topology and
>Xen's presentation of the bus topology to the guest to even match.
>  Paul
>
>> 
>> (Adding Stefano and Anthony as xen-hvm.c mentioned)
>> 
>> 
>> Here is another suggestion:
>> 
>> 1. QEMU use existing facilities to emulate PCIEXBAR for a Q35
>> machine, calling Xen's map_io_range_to_ioreq_server() API to mark
>> MMIO range for emulation, just like for any other emulated MMIO range
>> 
>> 2. All accesses to this area will be forwarded to QEMU as MMIO ioreqs
>> and emulated flawlessly as everything is within QEMU architecture --
>> pci-host/PCIBus/PCIDevice machinery in place. No workarounds required
>> for xen-hvm.c
>> 
>> 3. CF8/CFC accesses will be forwarded as _PCI_CONFIG ioreqs, as
>> usually. Both methods are in sync as they use common PCI emulation
>> infrastructure in QEMU
>> 
>> 4. At this point absolutely zero changes are required in both Xen and
>> QEMU code. Only existing interfaces are used. In fact, no related
>> code changes required at all except a bugfix for PCIEXBAR mask
>> emulation (provided in this series)
>> 
>> 5. But. Just to make the 'multiple device emulators' (no extra
>> reasons so far) feature to work, we add the same hypercall/dmop
>> usage to let Xen know where QEMU emulates MMCONFIG
>> 
>> 6. Xen will continue to trap accesses to this range but instead of
>> sending _COPY ioreq immediately, he will check the address against
>> known MMCONFIG location (in the same manner as above), then convert
>> the
>> offset within it to BDF and he can proceed to usual BDF-based ioreq
>> routing for those device emulator DMs, whatever they are
>> 
>> 7. In fact, MMIO -> PCI conf ioreq translation can be freely used as
>> well at this stage, if it is more convenient for 'multiple device
>> emulators' feature users. It can be even made selectable.
>> 
>> So, the question which needs explanation is: why do you think
>> MMIO->PCI conf ioreq translation is mandatory for MMCONFIG? Can't we
>> just add new hypercall/dmop to make ioreq routing for 'multiple
>> device emulators' to work while letting QEMU to use any API provided
>> for him to do its tasks?
>> 
>> It's kinda funny to pretend that QEMU don't know anything about
>> MMCONFIG being MMIO when it's QEMU who inform Xen about its memory
>> address and size.
>>   
>> >Regardless of how this ends up being implemented inside of QEMU I
>> >think the above approach is the right one from an architectural PoV.
>> >
>> >AFAICT there are still some reserved bits in the ioreq struct that
>> >you could use to signal 'this is a MCFG PCI access' if required.
>> >  
>> >> Approach #2. Handling MMCONFIG area inside QEMU using usual MMIO
>> >> emulation:
>> >>
>> >> 1. QEMU will trap accesses to PCIEXBAR (or whatever else possibly
>> >> supported in the future like HECBASE), eventually asking Xen to
>> >> map the MMCONFIG MMIO range for ioreq servicing just like it does
>> >> for any other emulated MMIO range, via
>> >> map_io_range_to_ioreq_server(). All changes in MMCONFIG
>> >> placement/status will lead to  
>> remapping/unmapping  
>> >> the MMIO range.
>> >>
>> >> 2. Xen will trap MMIO accesses to this area and forward them to
>> >> QEMU as MMIO (IOREQ_TYPE_COPY) ioreqs
>> >>
>> >> 3. QEMU will receive these accesses and pass them to the existing
>> >> MMCONFIG emulation -- pcie_mmcfg_data_read/write handlers, finally
>> >> resulting in same xen_host_pci_* function calls as before.
>> >>
>> >> This approach works "right out of the box", no changes needed for
>> >> either Xen or QEMU. As both _PCI_CONFIG and MMIO type ioreqs are
>> >> processed, either method can be used to access PCI/extended config
>> >> space -- CF8/CFC port I/O or MMIO accesses to MMCONFIG.
>> >>
>> >> IOREQ routing for multiple device emulators can be supported too.
>> >> In fact, the same mmconfig dmops/hypercalls can be added to let
>> >> Xen know where MMCONFIG area resides, Xen will use this
>> >> information to forward MMCONFIG MMIO ioreqs accordingly to BDF of
>> >> the address. The difference with the approach #1 is that these
>> >> interfaces are now completely optional when we use MMIO ioreqs
>> >> for MMCONFIG on vanilla Xen/QEMU.  
>> >
>> >As said above, if you forward MCFG accesses as IOREQ_TYPE_COPY you
>> >are forcing each IOREQ server to know the position of the MCFG area
>> >in order to do the decoding, this is not acceptable IMO.
>> >  
>> >> The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG
>> >> translation is a must have thing at all? It won't make handling
>> >> simpler. For current QEMU implementation IOREQ_TYPE_COPY (MMIO
>> >> accesses for MMCONFIG) would be preferable as it allows to use the
>> >> existing code.  
>> >
>> >Granted it's likely easier to implement, but it's also incorrect.
>> >You seem to have in mind the picture of a single IOREQ server (QEMU)
>> >handling all the devices.
>> >
>> >Although this is the most common scenario, it's not the only one
>> >supported by Xen. Your proposed solution breaks the usage of
>> >multiple IOREQ servers as PCI device emulators.
>> >  
>> >> I think it will be safe to use MMCONFIG emulation on MMIO level
>> >> for now and later extend it with 'set_mmconfig_' dmop/hypercall
>> >> for the 'multiple device emulators' IOREQ_TYPE_COPY routing to
>> >> work same as for PCI conf, so it can be used by XenGT etc on Q35
>> >> as well.  
>> >
>> >I'm afraid this kind of issues would have been fairly easier to
>> >identify if a design document for this feature was sent to the list
>> >prior to it's implementation.
>> >
>> >Regarding whether to accept something like this, I'm not really in
>> >favor, but IMO it depends on how much new code is added to handle
>> >this incorrect usage that would then go away (or would have to be
>> >changed) in order to handle the proper implementation.
>> >
>> >Thanks, Roger.  
>


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.