[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu



On 25/07/17 17:40, Alexey G wrote:
> On Mon, 24 Jul 2017 21:39:08 +0100
> Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote:
>>> But, the problem is that overall MMIO hole(s) requirements are not known
>>> exactly at the time the HVM domain being created. Some PCI devices will
>>> be emulated, some will be merely passed through and yet there will be
>>> some RMRR ranges. libxl can't know all this stuff - some comes from the
>>> host, some comes from DM. So actual MMIO requirements are known to
>>> hvmloader at the PCI bus enumeration time.
>>>   
>> IMO hvmloader shouldn't really be allowed to relocate memory under any
>> conditions. As Andrew said it's much easier to provision the hole
>> statically in libxl during domain construction process and it doesn't
>> really compromise any functionality. Having one more entity responsible
>> for guest memory layout only makes things more convoluted.
> If moving most tasks of hvmloader to libxl is a planned feature in Citrix,
> please let it be discussed on xen-devel first as it may affect many
> people... and not all of them might be happy. :)
>
> (tons of IMO and TLDR ahead, be warned)
>
> Moving PCI BAR allocation from guest side to libxl is a controversial step.
> This may be the architecturally wrong way in fact. There are properties and
> areas of responsibility. Among primary responsibilities of guest's firmware
> is PCI BARs and MMIO hole size allocation.

There is already a very blury line concerning "firmware".  What you
describe is correct for real hardware, but remember that virtual
machines are anything but.  There is already a lot of aspects of
initialisation covered by Xen or the toolstack which would be covered by
"firmware" in a native system.  A lot of these are never ever going to
move within guest control.

> That's a guest's territory.

Every tweakable which is available inside the guest is a security attack
surface.

It is important to weigh up all options, and it might indeed be the case
that putting the tweakable inside the guest is the correct action to
take, but simply "because that's what real hardware does" is not a good
enough argument.

We've had far too many XSAs due to insufficient forethought when lashing
things together in the past.

> Guest relocates PCI BARs (and not just BIOS able to do this), guest
> firmware relocates MMIO hole base for them. If it was a real system, all
> tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done
> by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were
> offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for
> them and generating ACPI tables. And that's ok as hvmloader can be
> considered merely a 'supplemental' firmware to perform some tasks of
> SeaBIOS/OVMF before passing control to them. This solution has some
> architecture logic at least and doesn't look bad.

PCI BAR relocation isn't interesting to consider.  It obviously has to
be dynamic (as the OS is free to renumber the bridges).

The issue I am concerned with is purely the MMIO window selection.  From
the point of view of the guest, this is fixed at boot; changing it
requires a reboot and altering the BIOS settings.

>
> On other hand, moving PCI hole calculation to libxl just to let Xen/libxl
> know what the MMIO size value is might be a bad idea.
> Aside from some code duplication, straying too far from the real hw paths,
> or breaking existing (or future) interfaces this might have some other
> negative consequences. Ex. who will be initializing guest's ACPI tables if
> only libxl will know the memory layout? Some new interfaces between libxl
> and hvmloader just to let the latter know what values to write to ACPI
> tables being created? Or libxl will be initializing guest's ACPI tables as
> well (another guest's internal task)? Similar concerns are applicable to
> guest's final E820 construction.

Who said anything about only libxl knowing the layout?

Whatever ends up happening, the hypervisor needs to know the layout to
be able to sensibly audit a number of guest actions which currently go
unaudited.  (I am disappointed that this wasn't done in the first place,
and surprised that Xen as a whole has managed to last this long without
this information being known to the hypervisor.)

>
> Another thing is that handling ioreq/PT MMIO ranges is somewhat a property
> of the device model (at least for now). Right now it's QEMU who traps PCI
> BAR accesses and tells Xen how to handle specific ranges of MMIO space. If
> QEMU already talks to Xen which ranges should be passed through or trapped
> -- it can tell him the current overall MMIO limits as well... or handle
> these limits himself -- if the MMIO hole range check is all what required to
> avoid MMIO space misusing, this check can be easily implemented in QEMU,
> provided that QEMU knows what memory/MMIO layout is. There is a lot of
> implementation freedom where to place restrictions and checks, Xen or QEMU.
> Strictly speaking, the MMIO hole itself can be considered a property of the
> emulated machine and may have implementation differences for different
> emulated chipsets. For example, the real i440' NB do not have an idea of
> high MMIO hole at all.
>
> We have already a sort of an interface between hvmloader and QEMU --
> hvmloader has to do basic initialization for some emulated chipset's
> registers (and this depends on the machine). Providing additional handling
> for few other registers (TOM/TOLUD/etc) will cost almost nothing and
> purpose of this registers will actually match their usage in real HW. This
> way we can use an existing available interface and don't stray too far from
> the real HW ways. 

The difference here is that there are two broad choices of how to proceed:
1) Calculate and set up the guest physical address space statically
during creation, making it immutable once the guest starts executing
code, or
2) Support the guest having dynamic control over its physical address space.

Which of these is a smaller attack surface?

So far, I see no advantage for going with option 2 (as it doesn't affect
any guest-visible behaviour), and a compelling set of reasons (based on
simplicity and reduction of security attack surface) to prefer option 1.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.