Xen project Mailing List

Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring

On Mon, 26 Mar 2018 10:24:38 +0100 Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote: [...] >> In fact, the emulated chipset (NB+SB combo without supplemental >> devices) itself is a small part of required emulation. It's >> relatively easy to provide own analogs of for eg. 'mch' and >> 'ICH9-LPC' QEMU PCIDevice's, the problem is to glue all remaining >> parts together. >> >> I assume the final goal in this case is to have only a set of >> necessary QEMU PCIDevice's for which we will be providing I/O, MMIO >> and PCI conf trapping facilities. Only devices such as rtl8139, >> ich9-ahci and few others. >> >> Basically, this means a new, chipset-less QEMU machine type. >> Well, in theory it is possible with a bit of effort I think. The main >> question is where will be the NB/SB/PCIbus emulating part reside in >> this case. > >Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of >the southbrigde will be emulated by a device model (ie: QEMU). > >As you mention above, I also took a look and it seems like the amount >of registers that we should emulate for Q35 DRAM controller (D0:F0) is >fairly minimal based on current QEMU implementation. We could even >possibly get away by just emulating PCIEXBAR. MCH emulation alone might be not an option. Besides, some southbridge-specific features like emulating ACPI PM facilities for domain power management (basically, anything at PMBASE) will be preferable to implement on Xen side, especially considering the fact that ACPI tables are already provided by Xen's libacpi/hvmloader, not the device model. I think the feature may require to cover at least the NB+SB combination, at least Q35 MCH + ICH9 for start, ideally 82441FX+PIIX4 as well. Also, Xen should control emulated/PT PCI device placement. Before going this way, it would be good to measure all risks. Looks like there are two main directions currently: I. (conservative) Let the main device model (QEMU) to inform Xen about the current chipset-specific MMCONFIG location, to allow Xen to know that some MMIO accesses to this area must be forwarded to other ioreq servers (device emulators) in a form of PCI config read/write ioreqs, if BDF corresponding to a MMCONFIG offset will point to the PCI device owned by a device emulator. In case of device emulators the conversion of MMIO accesses to PCI config ones is a mandatory step, while the owner of the MMCONFIG MMIO range may receive MMIO accesses in a native form without conversion (a strongly preferable option for QEMU). This approach assumes introducing of the new dmop/hypercall (something like XEN_DMOP_mmcfg_location_change) to pass to Xen basic MMCONFIG information -- address, enabled/disabled status (or simply address=0 instead) and size of the MMCONFIG area, eg. as a number of buses. This information is enough to select a proper ioreq server in Xen and allow multiple device emulators to function properly. For future compatibility we can also provide the segment and start/end bus range as arguments. What this approach will require: -------------------------------- - new notification-style dmop/hypercall to tell Xen about the current emulated MMCONFIG location - trivial changes in QEMU to use this dmop in Q35 PCIEXBAR handling code - relatively simple Xen changes in ioreq.c to use the provided range for ioreq server selection. Also, to provide MMIO -> PCI config ioreq translation for supplemental ioreq servers which don't know anything about the emulated system Risks: ------ Risk to break anything is minimal in this case. If QEMU will not provide this information (eg. due to an outdated version installed), only basic PCI config space accesses via CF8/CFC will be forwarded to a distinct ioreq server. This means the extended PCI config space accesses won't be forwarded to specific device emulators. Other than these device emulators, anything else will continue to work properly in this case. No differences will be for guest OSes without PCIe ECAM support in either case. In general, no breakthrough improvements, no negative side-effects. Just PCIe ECAM working as expected and compatibility with multiple ioreq servers is retained. II. (a new feature) Move chipset emulation to Xen directly. In this case no separate notification necessary as Xen will be emulating the chosen chipset itself. MMCONFIG location will be known from own PCIEXBAR emulation. QEMU will be used only to emulate a minimal set of unrelated devices (eg. storage/network/vga). Less dependency on QEMU overall. More freedom to implement some specific features in the future like smram support for EFI firmware needs. Chipset remapping (aka reclaim) functionality for memory relocation may be implemented under complete Xen control, avoiding usage of unsafe add_to_physmap hypercalls. In future this will allow to move passthrough-supporting code from QEMU (hw/xen/xen-pt*.c) to Xen, merging it with Roger's vpci series. This will improve eg. the PT + stubdomain situation a lot -- PCI config space accesses for PT devices will be handled in a uniform way without Dom0 interaction. This particular feature can be implemented for the previous approach as well, still it is easier to do when Xen controls the emulated machine In general, this is a good long-term direction. What this approach will require: -------------------------------- - Changes in QEMU code to support a new chipset-less machine(s). In theory might be possible to implement on top of the "null" machine concept - Major changes in Xen code to implement the actual chipset emulation there - Changes on the toolstack side as the emulated machine will be selected and used differently - Moving passthrough support from QEMU to Xen will likely require to re-divide areas of responsibility for PCI device passthrough between xen-pciback and the hypervisor. It might be more convenient to perform some tasks of xen-pciback in Xen directly - strong dependency between Xen/libxl/QEMU/etc versions -- any outdated component will be a major problem. Can be resolved by providing some compatibility code - longer implementation time Risks: ------ - A major architecture change with possible issues encountered during the implementation - Moving the emulation of the machine to Xen creates a non-zero risk of introducing a security issue while extending the emulation support further. As all emulation will take place on a most trusted level, any exploitable bug in the chipset emulation code may compromise the whole system - there is a risk to encounter some dependency on missing chipset devices in QEMU. Some of QEMU devices (which depend on QEMU chipset devices/properties) might not work without extra patches. In theory this may be addressed by leaving the dummy MCH/LPC/pci-host devices in place while not forwarding any IO/MMIO/PCI conf accesses to them (using simply as compat placeholders) - risk of incompatibility with future QEMU versions In both cases, for security concerns PCIEXBAR and other MCH registers can be made write-once (RO on all further accesses, similar to a TXT-locked system). [...] >> Regarding control of the guest memory map in the toolstack only... >> The problem is, only firmware can see a final memory map at the >> moment. And only the device model knows about invisible "service" >> ranges for emulated devices, like the LFB content (aka "VRAM") when >> it is not mapped to a guest. >> >> In order to calculate the final memory/MMIO hole split, we need to >> know: >> >> 1) all PCI devices on a PCI bus. At the moment Xen contributes only >> devices like PT to the final PCI bus (via QMP device_add). Others are >> QEMU ones. Even Xen platform PCI device relies on QEMU emulation. >> Non-QEMU device emulators are another source of virtual PCI devices I >> guess. >> >> 2) all chipset-specific emulated MMIO ranges. MMCONFIG is one of them >> and largest (up to 256Mb for a segment). There are few other smaller >> ranges, eg. Root Complex registers. All this ranges depend on the >> emulated chipset. >> >> 3) all reserved memory ranges (this one what toolstack already knows) >> >> 4) all "service" guest memory ranges like backing storage for VRAM in >> QEMU. Emulated Option ROMs should belong here too, but IIRC xen-hvm.c >> either intentionally or by mistate handles them as emulated ranges >> currently. >> >> If we miss any of these (like what are the chipset-specific ranges >> and their size alignment requirements) -- we're in trouble. But, if >> we know *all* of these, we can pre-calculate the MMIO hole size. >> Although this is a bit fragile to do from the toolstack because both >> sizing algo in the toolstack and MMIO BAR allocation code in the >> firmware (hvmloader) must have their algorithms synchronized, >> because it is possible to sruff BARs to MMIO hole in different ways, >> especially when PCI-PCI bridges will appear on the scene. Both need >> to do it in a consistent way (resulting in similar set of gaps >> between allocated BARs), otherwise expected MMIO hole sizes won't >> match, which means we may need to relocate MMIO BARs to the high >> MMIO hole and this in turn may lead to those overlaps with QEMU >> memories. > >I agree that the current memory layout management (or the lack of it) >is concerning. Although related, I think this should be tackled as a >different issue from the chipset one IMHO. > >Since you already posted the Q35 series I would attempt to get that >done first before jumping into the memory layout one. It is somewhat related to the chipset because memory/MMIO layout inconsistency can be solved more, well, naturally on Q35. Basically, we have a non-standard MMIO hole layout where the start of the high MMIO hole do not match the top of addressable RAM (due to invisible ranges of the device model). Q35 initially have facilities to allow firmware to modify (via emulation) or discover such MMIO hole setup which can be used for safe MMIO BAR allocation to avoid overlaps with QEMU-owned invisible ranges. It doesn't really matter which registers to pick for this task, but for Q35 this approach is at least consistent with what a real system does (PV/PVH people will find this peculiarity pointless I suppose :) ). _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.