[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu



On 25/07/17 08:03, Zhang, Xiong Y wrote:
>> On 24/07/17 17:42, Alexey G wrote:
>>> Hi,
>>>
>>> On Mon, 24 Jul 2017 10:53:16 +0100
>>> Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote:
>>>>> [Zhang, Xiong Y] Thanks for your suggestion.
>>>>> Indeed, if I set mmi_hole >= 4G - RMRR_Base, this could fix my issue.
>>>>> For this I still have two questions, could you help me ?
>>>>> 1) If hvmloader do low memory relocation, hvmloader and qemu will see a
>>>>> different guest memory layout . So qemu ram maybe overlop with mmio,
>>>>> does xen have plan to fix this ?
>>>>
>>>> hvmloader doesn't do memory relocation - this ability is turned off by
>>>> default. The reason for the issue is that libxl initially sets the size
>>>> of lower MMIO hole (based on the RMRR regions present and their size)
>>>> and doesn't communicate it to QEMU using 'max-ram-below-4g' argument.
>>>>
>>>> When you set 'mmio_hole' size parameter you basically forces libxl to
>>>> pass this argument to QEMU.
>>>>
>>>> That means the proper fix would be to make libxl to pass this argument
>>>> to QEMU in case there are RMRR regions present.
>>>
>>> I tend to disagree a bit.
>>> What we lack actually is some way to perform a 'dynamical' physmem
>>> relocation, when a guest domain is running already. Right now it works only
>>> in the 'static' way - i.e. if memory layout was known for both QEMU and
>>> hvmloader before starting a guest domain and with no means of arbitrarily
>>> changing this layout at runtime when hvmloader runs.
>>>
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will be
>>> emulated, some will be merely passed through and yet there will be some
>>> RMRR ranges. libxl can't know all this stuff - some comes from the host,
>>> some comes from DM. So actual MMIO requirements are known to
>> hvmloader at
>>> the PCI bus enumeration time.
>>>
>>
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
>>
>>> libxl can be taught to retrieve all missing info from QEMU, but this way
>>> will require to perform all grunt work of PCI BARs allocation in libxl
>>> itself - in order to calculate the real MMIO hole(s) size, one needs to
>>> take into account all PCI BARs sizes and their alignment requirements
>>> diversity + existing gaps due to RMRR ranges... basically, libxl will
>>> need to do most of hvmloader/pci.c's job.
>>>
>>
>> The algorithm implemented in hvmloader for that is not complicated and
>> can be moved to libxl easily. What we can do is to provision a hole big
>> enough to include all the initially assigned PCI devices. We can also
>> account for emulated MMIO regions if necessary. But, to be honest, it
>> doesn't really matter since if there is no enough space in lower MMIO
>> hole for some BARs - they can be easily relocated to upper MMIO
>> hole by hvmloader or the guest itself (dynamically).
>>
>> Igor
> [Zhang, Xiong Y] yes, If we could supply a big enough mmio hole and don't 
> allow hvmloader to do relocate, things will be easier. But how could we 
> supply a big enough mmio hole ?
> a. statical set base address of mmio hole to 2G/3G.
> b. Like hvmloader to probe all the pci devices and calculate mmio size. But 
> this runs prior to qemu, how to probe pci devices ? 
> 

It's true that we don't know the space occupied by emulated device
before QEMU is started.  But QEMU needs to be started with some lower
MMIO hole size statically assigned.

One of the possible solutions is to calculate a hole size required to
include all the assigned pass-through devices and round it up to the
nearest GB boundary but not larger than 2GB total. If it's not enough to
also include all the emulated devices - it's not enough, some of the PCI
device are going to be relocated to upper MMIO hole in that case.

Igor

> thanks
>>> My 2kop opinion here is that we don't need to move all PCI BAR allocation to
>>> libxl, or invent some new QMP-interfaces, or introduce new hypercalls or
>>> else. A simple and somewhat good solution would be to implement this
>> missing
>>> hvmloader <-> QEMU interface in the same manner how it is done in real
>>> hardware.
>>>
>>> When we move some part of guest memory in 4GB range to address space
>> above
>>> 4GB via XENMEM_add_to_physmap, we basically perform what chipset's
>>> 'remap' (aka reclaim) does. So we can implement this interface between
>>> hvmloader and QEMU via providing custom emulation for MCH's
>>> remap/TOLUD/TOUUD stuff in QEMU if xen_enabled().
>>>
>>> In this way hvmloader will calculate MMIO hole sizes as usual, relocate
>>> some guest RAM above 4GB base and communicate this information to
>> QEMU via
>>> emulated host bridge registers -- so then QEMU will sync its memory layout
>>> info to actual physmap's.
>>>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.