[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu



On 25/07/17 17:40, Alexey G wrote:
> On Mon, 24 Jul 2017 21:39:08 +0100
> Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote:
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will
>>> be emulated, some will be merely passed through and yet there will be
>>> some RMRR ranges. libxl can't know all this stuff - some comes from the
>>> host, some comes from DM. So actual MMIO requirements are known to
>>> hvmloader at the PCI bus enumeration time.
>>>   
>>
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
> 
> If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
> please let it be discussed on xen-devel first as it may affect many
> people... and not all of them might be happy. :)
> 

Everything always goes through the mailing list.

> (tons of IMO and TLDR ahead, be warned)
> 
> Moving PCI BAR allocation from guest side to libxl is a controversial step.
> This may be the architecturally wrong way in fact. There are properties and
> areas of responsibility. Among primary responsibilities of guest's firmware
> is PCI BARs and MMIO hole size allocation. That's a guest's territory.
> Guest relocates PCI BARs (and not just BIOS able to do this), guest
> firmware relocates MMIO hole base for them. If it was a real system, all
> tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
> by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
> offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
> them and generating ACPI tables. And that's ok as hvmloader can be
> considered merely a 'supplemental' firmware to perform some tasks of
> SeaBIOS/OVMF before passing control to them. This solution has some
> architecture logic at least and doesn't look bad.
> 

libxl is also a part of firmware so to speak. It's incorrect to think
that only hvmloader and BIOS images are "proper" firmware.

> On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
> know what the MMIO size value is might be a bad idea.
> Aside from some code duplication, straying too far from the real hw paths,
> or breaking existing (or future) interfaces this might have some other
> negative consequences. Ex. who will be initializing guest's ACPI tables if
> only libxl will know the memory layout? Some new interfaces between libxl
> and hvmloader just to let the latter know what values to write to ACPI
> tables being created? Or libxl will be initializing guest's ACPI tables as
> well (another guest's internal task)? Similar concerns are applicable to
> guest's final E820 construction.
> 

The information is not confined by libxl - it's passed to hvmloader and
it can finish the tasks libxl couldn't. Although, ACPI tables could be
harmlessly initialized inside libxl as well (see PVH implementation).

> Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
> of the device model (at least for now). Right now it's QEMU who traps PCI
> BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
> QEMU already talks to Xen which ranges should be passed through or trapped
> -- it can tell him the current overall MMIO limits as well... or handle
> these limits himself -- if the MMIO hole range check is all what required to
> avoid MMIO space misusing, this check can be easily implemented in QEMU,
> provided that QEMU knows what memory/MMIO layout is. There is a lot of
> implementation freedom where to place restrictions and checks, Xen or QEMU.
> Strictly speaking, the MMIO hole itself can be considered a property of the
> emulated machine and may have implementation differences for different
> emulated chipsets. For example, the real i440' NB do not have an idea of
> high MMIO hole at all.
> 
> We have already a sort of an interface between hvmloader and QEMU --
> hvmloader has to do basic initialization for some emulated chipset's
> registers (and this depends on the machine). Providing additional handling
> for few other registers (TOM/TOLUD/etc) will cost almost nothing and
> purpose of this registers will actually match their usage in real HW. This
> way we can use an existing available interface and don't stray too far from
> the real HW ways. 
> 
> I want to try this approach for Q35 bringup patches for Xen I'm currently
> working on. I'll send these patches as RFC and will be glad to receive some
> constructive criticism.
> 

Sure. Static hole size provisioning doesn't prohibit its dynamic
counterpart.

Igor

>>> libxl can be taught to retrieve all missing info from QEMU, but this way
>>> will require to perform all grunt work of PCI BARs allocation in libxl
>>> itself - in order to calculate the real MMIO hole(s) size, one needs to
>>> take into account all PCI BARs sizes and their alignment requirements
>>> diversity + existing gaps due to RMRR ranges... basically, libxl will
>>> need to do most of hvmloader/pci.c's job.
>>>   
>>
>> The algorithm implemented in hvmloader for that is not complicated and
>> can be moved to libxl easily. What we can do is to provision a hole big
>> enough to include all the initially assigned PCI devices. We can also
>> account for emulated MMIO regions if necessary. But, to be honest, it
>> doesn't really matter since if there is no enough space in lower MMIO
>> hole for some BARs - they can be easily relocated to upper MMIO
>> hole by hvmloader or the guest itself (dynamically).

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.