[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring



On Wed, 21 Mar 2018 09:09:11 +0000
Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote:

>On Wed, Mar 21, 2018 at 10:58:40AM +1000, Alexey G wrote:
[...]
>> According to public slides for the feature, both PCI conf and MMIO
>> accesses can be routed to the designated device model. It looks like
>> for this particular setup it doesn't really matter which particular
>> ioreq type must be used for MMCONFIG accesses -- either
>> IOREQ_TYPE_PCI_CONFIG or IOREQ_TYPE_COPY (MMIO accesses) should be
>> acceptable.  
>
>Isn't that going to be quite messy? How is the IOREQ server supposed
>to decode a MCFG access received as IOREQ_TYPE_COPY?

This code is already available and in sync with QEMU legacy PCI conf
emulation infrastructure.

>I don't think the IOREQ server needs to know the start of the MCFG
>region, in which case it won't be able to detect and decode the
>access if it's of type IOREQ_TYPE_COPY.

How do you think Xen will be able to know if arbitrary MMIO
access targets MMCONFIG area and to which BDF the offset in this area
belongs, without knowing where MMCONFIG is located and what PCI bus
layout is? It's QEMU who emulate PCIEXBAR and can tell Xen where
MMCONFIG is expected to be.

>MCFG accesses need to be sent to the IOREQ server as
>IOREQ_TYPE_PCI_CONFIG, or else you are forcing each IOREQ server to
>know the position of the MCFG area in order to do the decoding. In
>your case this would work because QEMU controls the position of the
>MCFG region, but there's no need for other IOREQ servers to know the
>position of the MCFG area.
>
>> The only thing which matters is ioreq routing itself --
>> making decisions to which device model the PCI conf/MMIO ioreq should
>> be sent.  
>
>Hm, see above, but I'm fairly sure you need to forward those MCFG
>accesses as IOREQ_TYPE_PCI_CONFIG to the IOREQ server.

(a detailed answer below)

>> >Traditional PCI config space accesses are not IO port space
>> >accesses.  
>> 
>> (assuming 'not' mistyped here)  
>
>Not really, this should instead be:
>
>"Traditional PCI config space accesses are not forwarded to the IOREQ
>server as IO port space accesses (IOREQ_TYPE_PIO) but rather as PCI
>config space accesses (IOREQ_TYPE_PCI_CONFIG)."
>
>Sorry for the confusion.
>
>> >The IOREQ code in Xen detects accesses to ports 0xcf8/0xcfc and
>> >IOREQ servers can register devices they would like to receive
>> >configuration space accesses for. QEMU is already making use of
>> >this, see for  
>> 
>> That's one of the reasons why current IOREQ_TYPE_PCI_CONFIG
>> implementation is a bit inconvenient for MMCONFIG MMIO accesses --
>> it's too much CF8h/CFCh-centric in its implementation, might be
>> painful to change something in the code which was intended for
>> CF8h/CFCh handling (and not for MMIO processing).  
>
>I'm not sure I follow. Do you mean that changes should be made to the
>ioreq struct in order to forward MCFG accesses using
>IOREQ_TYPE_PCI_CONFIG as it's type?

No changes for ioreq structures needed for now.

>> It will be handled by IOREQ too, just using a different IOREQ type
>> (MMIO one). The basic question is why do we have to stick to PCI conf
>> space ioreqs for emulating MMIO accesses to MMCONFIG.  
>
>Because other IOREQ servers don't need to know about the position/size
>of the MCFG area, and cannot register MMIO ranges that cover their
>device's PCI configuration space in the MCFG region.
>
>Not to mention that it would would be a terrible design flaw to force
>IOREQ servers to register PCI devices and MCFG areas belonging to
>those devices separately as MMIO in order to trap all possible PCI
>configuration space accesses.

PCI conf space layout is shared by the emulated machine. And MMCONFIG
layout is mandated by this common PCI bus map.

Even if those 'multiple device models' see a different picture of PCI
conf space, their visions of PCI bus must not overlap + MMCONFIG layout
must be consistent between different device models.

Although it is a terrible mistake to think about the emulated PCI bus
like it's a set of distinct PCI devices unrelated to each other. It's
all coupled together. And this is especially true for PCIe.
Many PCIe features rely on PCIe device interaction in PCIe fabric, eg.
PCIe endpoints may interact with Root Complex in many ways. This
cooperation may  need to be emulated somehow, eg. to provide some
support for PM features, link management or native hotplug facilities.
Even if we have a real passed through device, we might need to provide
an emulated PCIe Switch or a Root Port for it to function properly
within the PCIe hierarchy.

Dedicating an isolated PCI device to some isolated device model --
that's what might be the design flaw, considering the PCIe world.

[...]
>
>Maybe you could detect offsets >= 256 and replay them in QEMU like
>mmio accesses? Using the address_space_write or
>pcie_mmcfg_data_read/write functions?
>I have to admit my knowledge of QEMU is quite limited, so I'm not sure
>of the best way to handle this.
>
>Ideally we should find a way that doesn't involve having to modify
>each chipset to handle MCFG accesses from Xen. It would be nice to
>have some kind of interface inside of QEMU so all chipsets can
>register MCFG areas or modify them, but this is out of the scope of
>this work.

Roger, Paul,

Here is what you suggest, just to clarify:

1. Add to Xen a new hypercall (+corresponding dmop) so QEMU can tell
Xen where QEMU emulates machine's MMCONFIG (chipset-specific emulation
of PCIEXBAR/HECBASE/etc mmcfg relocation). Xen will rely on this
information to know to which PCI device the address within MMCONFIG
belong.

2. Xen will trap this area + remap its trapping to other address if QEMU
will inform Xen about emulated PCIEXBAR value change

3. Every MMIO access to the current MMCONFIG range will be converted
into BDF first (by offset within this range, knowing where the range is)

4. Target device model is selected using calculated BDF

5. MMIO read/write accesses are converted into PCI config space ioreqs
(like it was a CF8/CFCh operation instead of MMIO access). At this
point ioreq structure allows to specify extended PCI conf offset
(12-bit), so it will fit into PCI conf ioreq. For now let's assume that
eg. a 64-bit memory operation is either aborted or workarounded by
splitting this operation into multiple PCI conf ioreqs.

6. PCI conf read/write ioreqs are sent to the chosen device model

7. QEMU receive MMCONFIG memory reads/writes as PCI conf reads/writes

8. As these MMCONFIG PCI conf reads occur out of context (just
address/len/data without any emulated device attached to it), xen-hvm.c
should employ special logic to make it QEMU-friendly -- eg. right now
it sends received PCI conf access into (emulated by QEMU) CF8h/CFCh
ports.
There is a real problem to embed these "naked" accesses into QEMU
infrastructure, workarounds are required. BTW, find_primary_bus() was
dropped from QEMU code -- it could've been useful here. Let's assume
some workaround is employed (like storing a required object pointers in
global variables for later use in xen-hvm.c)

9. Existing MMCONFIG-handling code in QEMU will be unused in this
scenario

10. All this needed primarily to make the specific "Multiple device
emulators" feature to work (XenGT was mentioned as its user) on Q35
with MMCONFIG.

Anything wrong/missing here?

(Adding Stefano and Anthony as xen-hvm.c mentioned)


Here is another suggestion:

1. QEMU use existing facilities to emulate PCIEXBAR for a Q35
machine, calling Xen's map_io_range_to_ioreq_server() API to mark MMIO
range for emulation, just like for any other emulated MMIO range

2. All accesses to this area will be forwarded to QEMU as MMIO ioreqs
and emulated flawlessly as everything is within QEMU architecture --
pci-host/PCIBus/PCIDevice machinery in place. No workarounds required
for xen-hvm.c

3. CF8/CFC accesses will be forwarded as _PCI_CONFIG ioreqs, as usually.
Both methods are in sync as they use common PCI emulation
infrastructure in QEMU

4. At this point absolutely zero changes are required in both Xen and
QEMU code. Only existing interfaces are used. In fact, no related code
changes required at all except a bugfix for PCIEXBAR mask emulation
(provided in this series)

5. But. Just to make the 'multiple device emulators' (no extra reasons
so far) feature to work, we add the same hypercall/dmop usage to let
Xen know where QEMU emulates MMCONFIG

6. Xen will continue to trap accesses to this range but instead of
sending _COPY ioreq immediately, he will check the address against
known MMCONFIG location (in the same manner as above), then convert the
offset within it to BDF and he can proceed to usual BDF-based ioreq
routing for those device emulator DMs, whatever they are

7. In fact, MMIO -> PCI conf ioreq translation can be freely used as
well at this stage, if it is more convenient for 'multiple device
emulators' feature users. It can be even made selectable.

So, the question which needs explanation is: why do you think MMIO->PCI
conf ioreq translation is mandatory for MMCONFIG? Can't we just add new
hypercall/dmop to make ioreq routing for 'multiple device emulators' to
work while letting QEMU to use any API provided for him to do its tasks?

It's kinda funny to pretend that QEMU don't know anything about
MMCONFIG being MMIO when it's QEMU who inform Xen about its memory
address and size.

>Regardless of how this ends up being implemented inside of QEMU I
>think the above approach is the right one from an architectural PoV.
>
>AFAICT there are still some reserved bits in the ioreq struct that you
>could use to signal 'this is a MCFG PCI access' if required.
>
>> Approach #2. Handling MMCONFIG area inside QEMU using usual MMIO
>> emulation:
>> 
>> 1. QEMU will trap accesses to PCIEXBAR (or whatever else possibly
>> supported in the future like HECBASE), eventually asking Xen to map
>> the MMCONFIG MMIO range for ioreq servicing just like it does for any
>> other emulated MMIO range, via map_io_range_to_ioreq_server(). All
>> changes in MMCONFIG placement/status will lead to remapping/unmapping
>> the MMIO range.
>> 
>> 2. Xen will trap MMIO accesses to this area and forward them to QEMU
>> as MMIO (IOREQ_TYPE_COPY) ioreqs
>> 
>> 3. QEMU will receive these accesses and pass them to the existing
>> MMCONFIG emulation -- pcie_mmcfg_data_read/write handlers, finally
>> resulting in same xen_host_pci_* function calls as before.
>> 
>> This approach works "right out of the box", no changes needed for
>> either Xen or QEMU. As both _PCI_CONFIG and MMIO type ioreqs are
>> processed, either method can be used to access PCI/extended config
>> space -- CF8/CFC port I/O or MMIO accesses to MMCONFIG.
>> 
>> IOREQ routing for multiple device emulators can be supported too. In
>> fact, the same mmconfig dmops/hypercalls can be added to let Xen know
>> where MMCONFIG area resides, Xen will use this information to forward
>> MMCONFIG MMIO ioreqs accordingly to BDF of the address. The
>> difference with the approach #1 is that these interfaces are now
>> completely optional when we use MMIO ioreqs for MMCONFIG on vanilla
>> Xen/QEMU.  
>
>As said above, if you forward MCFG accesses as IOREQ_TYPE_COPY you are
>forcing each IOREQ server to know the position of the MCFG area in
>order to do the decoding, this is not acceptable IMO.
>
>> The question is why IOREQ_TYPE_COPY -> IOREQ_TYPE_PCI_CONFIG
>> translation is a must have thing at all? It won't make handling
>> simpler. For current QEMU implementation IOREQ_TYPE_COPY (MMIO
>> accesses for MMCONFIG) would be preferable as it allows to use the
>> existing code.  
>
>Granted it's likely easier to implement, but it's also incorrect. You
>seem to have in mind the picture of a single IOREQ server (QEMU)
>handling all the devices.
>
>Although this is the most common scenario, it's not the only one
>supported by Xen. Your proposed solution breaks the usage of multiple
>IOREQ servers as PCI device emulators.
>
>> I think it will be safe to use MMCONFIG emulation on MMIO level for
>> now and later extend it with 'set_mmconfig_' dmop/hypercall for the
>> 'multiple device emulators' IOREQ_TYPE_COPY routing to work same as
>> for PCI conf, so it can be used by XenGT etc on Q35 as well.  
>
>I'm afraid this kind of issues would have been fairly easier to
>identify if a design document for this feature was sent to the list
>prior to it's implementation.
>
>Regarding whether to accept something like this, I'm not really in
>favor, but IMO it depends on how much new code is added to handle this
>incorrect usage that would then go away (or would have to be changed)
>in order to handle the proper implementation.
>
>Thanks, Roger.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.