[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring



On Fri, 23 Mar 2018 13:57:11 +0000
Paul Durrant <Paul.Durrant@xxxxxxxxxx> wrote:
[...]
>> Few related thoughts:
>> 
>> 1. MMCONFIG address is chipset-specific. On Q35 it's a PCIEXBAR, on
>> other x86 systems it may be HECBASE or else. So we can assume it is
>> bound to the emulated machine
>
>Xen emulates the machine so it should be emulating PCIEXBAR. 

Actually, Xen currently emulates only few devices. Others are
provided by QEMU, that's the problem.

>> 2. We rely on QEMU to emulate different machines for us.
>We should not be. It's a historical artefact that we rely on QEMU for
>any part of machine emulation.

HVM guests need to see something more or less close to real hardware to
run. Even if we later install PV drivers for network/storage/etc usage,
we still need to support system firmware (SeaBIOS/OVMF) and be able to
install any (ideally) OS which expects to be installed only on some
real x86 hw. We also need to be ready to fallback to the emulated hw if
eg. user will boot OS in the safe mode.

It all depends on what you mean by not relying on QEMU for any part
of machine emulation.

There is a number of mandatory devices which should be provided for a
typical x86 system. Xen emulates some of them, but there is a number
which he doesn't. Apart from "classic" devices like RTC, PIT, KBC, etc
we need to provide at least storage and network interfaces.

Windows installer won't be happy to boot from the PV storage device, he
prefers to encounter something like AHCI (Windows 7+), ATA (for older
OSes) or ATAPI if it is an iso cd.
Providing emulation for the AHCI+ATA+ATAPI trio alone is a non-trivial
task. QEMU itself provides only partial implementation of these, many
features are unsupported. Another very useful thing to emulate is USB.
Depending on the controller version and device classes required, this
may be far more complex to emulate than AHCI+ATA+ATAPI combined.

So, if you suggest to drop QEMU completely, it means that all this
functionality must be replaced by own. Not that hard, but still a lot
of effort.


OTOH, if you mean stripping QEMU of general PCI bus control and
replacing his emulated NB/SB with Xen-owned -- well, it theory it
should be possible, with patches on QEMU side.

In fact, the emulated chipset (NB+SB combo without supplemental devices)
itself is a small part of required emulation. It's relatively easy to
provide own analogs of for eg. 'mch' and 'ICH9-LPC' QEMU PCIDevice's,
the problem is to glue all remaining parts together.

I assume the final goal in this case is to have only a set of necessary
QEMU PCIDevice's for which we will be providing I/O, MMIO and PCI conf
trapping facilities. Only devices such as rtl8139, ich9-ahci and few
others.

Basically, this means a new, chipset-less QEMU machine type.
Well, in theory it is possible with a bit of effort I think. The main
question is where will be the NB/SB/PCIbus emulating part reside in
this case. As this part must still have some priveleges, it's basically
the same decision problem as with QEMU's dwelling place -- stubdomain,
Dom0 or else.

>> 3. There are users which touch chipset-specific PCIEXBAR directly if
>> they see a Q35 system (OVMF so far)
>
>And we should squash such accesses.
>

Yes, we have that privilege (i.e. allocating all IO/MMIO bases) for
hvmloader. OVMF should not differ in this subject to SeaBIOS.

>The toolstack should be sole
>control of the guest memory map. It should be the only building MCFG
>so it should decide where the MMCONFIG regions go, not the firmware
>running in guest context.

HVM memory layout is another problem which needs solution BTW. I had to
implement one for my PT goals, but it's very radical I'm afraid.

Right now there are wicked issues present in handling memory layout
between hvmloader and QEMU. They may see a different memory map, even
with overlaps in some (depending on MMIO hole size and content) cases --
like an attempt to place MMIO BAR over memory which is used for vram
backing storage by QEMU, causing variety of issues like emulated I/O
errors (with a storage device) during guest boot attempt.

Regarding control of the guest memory map in the toolstack only... The
problem is, only firmware can see a final memory map at the moment.
And only the device model knows about invisible "service" ranges for
emulated devices, like the LFB content (aka "VRAM") when it is not
mapped to a guest.

In order to calculate the final memory/MMIO hole split, we need to know:

1) all PCI devices on a PCI bus. At the moment Xen contributes only
devices like PT to the final PCI bus (via QMP device_add). Others are
QEMU ones. Even Xen platform PCI device relies on QEMU emulation.
Non-QEMU device emulators are another source of virtual PCI devices I
guess.

2) all chipset-specific emulated MMIO ranges. MMCONFIG is one of them
and largest (up to 256Mb for a segment). There are few other smaller
ranges, eg. Root Complex registers. All this ranges depend on the
emulated chipset.

3) all reserved memory ranges (this one what toolstack already knows)

4) all "service" guest memory ranges like backing storage for VRAM in
QEMU. Emulated Option ROMs should belong here too, but IIRC xen-hvm.c
either intentionally or by mistate handles them as emulated ranges
currently.

If we miss any of these (like what are the chipset-specific ranges and
their size alignment requirements) -- we're in trouble. But, if we know
*all* of these, we can pre-calculate the MMIO hole size. Although this
is a bit fragile to do from the toolstack because both sizing algo in
the toolstack and MMIO BAR allocation code in the firmware (hvmloader)
must have their algorithms synchronized, because it is possible to
sruff BARs to MMIO hole in different ways, especially when PCI-PCI
bridges will appear on the scene. Both need to do it in a consistent way
(resulting in similar set of gaps between allocated BARs), otherwise
expected MMIO hole sizes won't match, which means we may need to
relocate MMIO BARs to the high MMIO hole and this in turn may lead to
those overlaps with QEMU memories.

>> Seems like we're pretty limited in freedom of choice in this
>> conditions, I'm afraid.  
>
>I don't think so. We're only limited if we use QEMU's Q35 emulation
>and what I'm saying is that we should not be doing that (nor should be
>we be using it to emulate any part of the PIIX today).
>
>  Paul


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.