Xen project Mailing List

Re: [Xen-devel] [Bug] Intel RMRR support with upstream Qemu

To: Alexey G <x1917x@xxxxxxxxx>, Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>

From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Tue, 25 Jul 2017 18:47:26 +0100

Cc: "Zhang, Xiong Y" <xiong.y.zhang@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Tue, 25 Jul 2017 17:47:33 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 25/07/17 17:40, Alexey G wrote: > On Mon, 24 Jul 2017 21:39:08 +0100 > Igor Druzhinin <igor.druzhinin@xxxxxxxxxx> wrote: >>> But, the problem is that overall MMIO hole(s) requirements are not known >>> exactly at the time the HVM domain being created. Some PCI devices will >>> be emulated, some will be merely passed through and yet there will be >>> some RMRR ranges. libxl can't know all this stuff - some comes from the >>> host, some comes from DM. So actual MMIO requirements are known to >>> hvmloader at the PCI bus enumeration time. >>> >> IMO hvmloader shouldn't really be allowed to relocate memory under any >> conditions. As Andrew said it's much easier to provision the hole >> statically in libxl during domain construction process and it doesn't >> really compromise any functionality. Having one more entity responsible >> for guest memory layout only makes things more convoluted. > If moving most tasks of hvmloader to libxl is a planned feature in Citrix, > please let it be discussed on xen-devel first as it may affect many > people... and not all of them might be happy. :) > > (tons of IMO and TLDR ahead, be warned) > > Moving PCI BAR allocation from guest side to libxl is a controversial step. > This may be the architecturally wrong way in fact. There are properties and > areas of responsibility. Among primary responsibilities of guest's firmware > is PCI BARs and MMIO hole size allocation. There is already a very blury line concerning "firmware". What you describe is correct for real hardware, but remember that virtual machines are anything but. There is already a lot of aspects of initialisation covered by Xen or the toolstack which would be covered by "firmware" in a native system. A lot of these are never ever going to move within guest control. > That's a guest's territory. Every tweakable which is available inside the guest is a security attack surface. It is important to weigh up all options, and it might indeed be the case that putting the tweakable inside the guest is the correct action to take, but simply "because that's what real hardware does" is not a good enough argument. We've had far too many XSAs due to insufficient forethought when lashing things together in the past. > Guest relocates PCI BARs (and not just BIOS able to do this), guest > firmware relocates MMIO hole base for them. If it was a real system, all > tasks like PCI BAR allocation, remapping part of RAM above 4G etc were done > by system BIOS. In our case some of SeaBIOS/OVMF responsibilities were > offloaded to hvmloader, like PCI BARs allocation, sizing MMIO hole(s) for > them and generating ACPI tables. And that's ok as hvmloader can be > considered merely a 'supplemental' firmware to perform some tasks of > SeaBIOS/OVMF before passing control to them. This solution has some > architecture logic at least and doesn't look bad. PCI BAR relocation isn't interesting to consider. It obviously has to be dynamic (as the OS is free to renumber the bridges). The issue I am concerned with is purely the MMIO window selection. From the point of view of the guest, this is fixed at boot; changing it requires a reboot and altering the BIOS settings. > > On other hand, moving PCI hole calculation to libxl just to let Xen/libxl > know what the MMIO size value is might be a bad idea. > Aside from some code duplication, straying too far from the real hw paths, > or breaking existing (or future) interfaces this might have some other > negative consequences. Ex. who will be initializing guest's ACPI tables if > only libxl will know the memory layout? Some new interfaces between libxl > and hvmloader just to let the latter know what values to write to ACPI > tables being created? Or libxl will be initializing guest's ACPI tables as > well (another guest's internal task)? Similar concerns are applicable to > guest's final E820 construction. Who said anything about only libxl knowing the layout? Whatever ends up happening, the hypervisor needs to know the layout to be able to sensibly audit a number of guest actions which currently go unaudited. (I am disappointed that this wasn't done in the first place, and surprised that Xen as a whole has managed to last this long without this information being known to the hypervisor.) > > Another thing is that handling ioreq/PT MMIO ranges is somewhat a property > of the device model (at least for now). Right now it's QEMU who traps PCI > BAR accesses and tells Xen how to handle specific ranges of MMIO space. If > QEMU already talks to Xen which ranges should be passed through or trapped > -- it can tell him the current overall MMIO limits as well... or handle > these limits himself -- if the MMIO hole range check is all what required to > avoid MMIO space misusing, this check can be easily implemented in QEMU, > provided that QEMU knows what memory/MMIO layout is. There is a lot of > implementation freedom where to place restrictions and checks, Xen or QEMU. > Strictly speaking, the MMIO hole itself can be considered a property of the > emulated machine and may have implementation differences for different > emulated chipsets. For example, the real i440' NB do not have an idea of > high MMIO hole at all. > > We have already a sort of an interface between hvmloader and QEMU -- > hvmloader has to do basic initialization for some emulated chipset's > registers (and this depends on the machine). Providing additional handling > for few other registers (TOM/TOLUD/etc) will cost almost nothing and > purpose of this registers will actually match their usage in real HW. This > way we can use an existing available interface and don't stray too far from > the real HW ways. The difference here is that there are two broad choices of how to proceed: 1) Calculate and set up the guest physical address space statically during creation, making it immutable once the guest starts executing code, or 2) Support the guest having dynamic control over its physical address space. Which of these is a smaller attack surface? So far, I see no advantage for going with option 2 (as it doesn't affect any guest-visible behaviour), and a compelling set of reasons (based on simplicity and reduction of security attack surface) to prefer option 1. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.