[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring
On Tue, 27 Mar 2018 09:45:30 +0100 Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: >On Tue, Mar 27, 2018 at 05:42:11AM +1000, Alexey G wrote: >> On Mon, 26 Mar 2018 10:24:38 +0100 >> Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: >> >> >On Sat, Mar 24, 2018 at 08:32:44AM +1000, Alexey G wrote: >> [...] >> >> In fact, the emulated chipset (NB+SB combo without supplemental >> >> devices) itself is a small part of required emulation. It's >> >> relatively easy to provide own analogs of for eg. 'mch' and >> >> 'ICH9-LPC' QEMU PCIDevice's, the problem is to glue all remaining >> >> parts together. >> >> >> >> I assume the final goal in this case is to have only a set of >> >> necessary QEMU PCIDevice's for which we will be providing I/O, >> >> MMIO and PCI conf trapping facilities. Only devices such as >> >> rtl8139, ich9-ahci and few others. >> >> >> >> Basically, this means a new, chipset-less QEMU machine type. >> >> Well, in theory it is possible with a bit of effort I think. The >> >> main question is where will be the NB/SB/PCIbus emulating part >> >> reside in this case. >> > >> >Mostly inside of Xen. Of course the IDE/SATA/USB/Ethernet... part of >> >the southbrigde will be emulated by a device model (ie: QEMU). >> > >> >As you mention above, I also took a look and it seems like the >> >amount of registers that we should emulate for Q35 DRAM controller >> >(D0:F0) is fairly minimal based on current QEMU implementation. We >> >could even possibly get away by just emulating PCIEXBAR. >> >> MCH emulation alone might be not an option. Besides, some >> southbridge-specific features like emulating ACPI PM facilities for >> domain power management (basically, anything at PMBASE) will be >> preferable to implement on Xen side, especially considering the fact >> that ACPI tables are already provided by Xen's libacpi/hvmloader, not >> the device model. > >Likely, but AFAICT this is kind of already broken, because PM1a and >TMR is already emulated by Xen at hardcoded values. See >xen/arch/x86/hvm/pmtimer.c. Yes, that should be an argument to try to implement PMBASE emulation in Xen too. Although this needs to be checked against dependencies in QEMU first, especially with ACPI-related code. This way we can have a better flexibility to use an arbitrary PMBASE value, not just having to hardcode it to ACPI_PM1A_EVT_BLK_ADDRESS_V1 in all related components. >> I think the feature may require to cover at least the NB+SB >> combination, at least Q35 MCH + ICH9 for start, ideally 82441FX+PIIX4 >> as well. Also, Xen should control emulated/PT PCI device placement. > >Q35 MCH (D0:F0) it's required in order to trap access to PCIEXBAR. Absolutely. BTW, another somewhat related problem at the moment is that Xen knows nothing about a chipset-specific MMIO hole(s). Due to this, it is possible for a guest to map PT BARs outside the MMIO hole, leading to errors like this: (XEN) memory_map:remove: dom4 gfn=c8000 mfn=c8000 nr=2000 (XEN) memory_map:add: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000 (XEN) p2m.c:1121:d0v5 p2m_set_entry: 0xffffffffc8000:9 -> -22 (0xc8000) (XEN) memory_map:fail: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000 ret:-22 (XEN) memory_map:remove: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000 (XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=ffffffffc8000 type:4 (XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff] (XEN) memory_map:remove: dom4 gfn=ffffffffc8000 mfn=c8000 nr=2000 (XEN) p2m.c:1228:d0v5 gfn_to_mfn failed! gfn=ffffffffc8000 type:4 (XEN) memory_map: error -22 removing dom4 access to [c8000,c9fff] (XEN) memory_map:add: dom4 gfn=c8000 mfn=c8000 nr=2000 Note that it was merely a lame BAR sizing attempt from the guest-side SW (a PCI config space viewing tool) -- writing F's to the high part of the MMIO BAR first. If we will know the guest's MMIO hole bounds, we can adapt to this behavior, avoiding erroneous mapping attempts to a wrong address outside the MMIO hole. Only the MMIO hole designated range can be used to map PT device BARs. So, if we will be actually emulating MCH's MMIO hole related registers in Xen as well -- we can use them as scratchpad registers (write-once of course) to pass this kind of information between Xen and other involved parties as an alternative to eg. a dedicated hypercall. >Could you be more concise about ICH9? > >The ICH9 spec contains multiple devices, for example it includes an >ethernet controller and a SATA controller, which we should not emulate >inside of Xen. ICH built-in devices from out PoV can be considered as distinct PCI devices (as long as they're actually distinct devices in PCI config space). It's a QEMU's approach for them -- these devices can be added to a q35 machine optionally. Only a minimal set of devices provided initially, like MCH/LPC/AHCI. SMBus controller (0:1F.3) added by default too, but it's not useful much at the moment. So mostly we can consider the LPC bridge (0:1F.0) for emulation of all devices provided by a real ICH SB. >> II. (a new feature) Move chipset emulation to Xen directly. >> >> In this case no separate notification necessary as Xen will be >> emulating the chosen chipset itself. MMCONFIG location will be known >> from own PCIEXBAR emulation. >> >> QEMU will be used only to emulate a minimal set of unrelated devices >> (eg. storage/network/vga). Less dependency on QEMU overall. >> >> More freedom to implement some specific features in the future like >> smram support for EFI firmware needs. Chipset remapping (aka reclaim) >> functionality for memory relocation may be implemented under complete >> Xen control, avoiding usage of unsafe add_to_physmap hypercalls. >> >> In future this will allow to move passthrough-supporting code from >> QEMU (hw/xen/xen-pt*.c) to Xen, merging it with Roger's vpci series. >> This will improve eg. the PT + stubdomain situation a lot -- PCI >> config space accesses for PT devices will be handled in a uniform >> way without Dom0 interaction. >> This particular feature can be implemented for the previous approach >> as well, still it is easier to do when Xen controls the emulated >> machine >> >> In general, this is a good long-term direction. >> >> What this approach will require: >> -------------------------------- >> >> - Changes in QEMU code to support a new chipset-less machine(s). In >> theory might be possible to implement on top of the "null" machine >> concept > >Not all parts of the chipset should go inside of Xen, ATM I only >foresee Q35 MCH being implemented inside of Xen. So I'm not sure >calling this a chipset-less machine is correct from QEMU PoV. Emulating only MCH in Xen will still require lot of changes but overall benefit will become unclear -- basically, we just move PCIEXBAR emulation to Xen from QEMU. >> - Major changes in Xen code to implement the actual chipset emulation >> there >> >> - Changes on the toolstack side as the emulated machine will be >> selected and used differently >> >> - Moving passthrough support from QEMU to Xen will likely require to >> re-divide areas of responsibility for PCI device passthrough >> between xen-pciback and the hypervisor. It might be more convenient >> to perform some tasks of xen-pciback in Xen directly > >Moving pci-passthough from QEMU to Xen is IMO a separate project, and >by the text you provide I'm not sure how is that related to the Q35 >chipset implementation. Yes, it's more a separate feature on top of that approach. >> - strong dependency between Xen/libxl/QEMU/etc versions -- any >> outdated component will be a major problem. Can be resolved by >> providing some compatibility code > >Well, you would only be able to use the Q35 feature with the right >version of the components. > >> - longer implementation time >> >> Risks: >> ------ >> >> - A major architecture change with possible issues encountered during >> the implementation >> >> - Moving the emulation of the machine to Xen creates a non-zero risk >> of introducing a security issue while extending the emulation support >> further. As all emulation will take place on a most trusted level, >> any exploitable bug in the chipset emulation code may compromise the >> whole system >> >> - there is a risk to encounter some dependency on missing chipset >> devices in QEMU. Some of QEMU devices (which depend on QEMU chipset >> devices/properties) might not work without extra patches. In theory >> this may be addressed by leaving the dummy MCH/LPC/pci-host devices >> in place while not forwarding any IO/MMIO/PCI conf accesses to them >> (using simply as compat placeholders) >> >> - risk of incompatibility with future QEMU versions >> >> In both cases, for security concerns PCIEXBAR and other MCH registers >> can be made write-once (RO on all further accesses, similar to a >> TXT-locked system). > >I think option II is the right way to move forward. Agree, it's a good long-term direction. Well, the problem is, option 1 can be implemented in a matter of 1-3 days. It will allow MMCONFIG to work with multiple device emulators while being very light on requirements -- no big code changes necessary, easy to test/review, etc. OTOH, option 2 will require some research first as the change is non-trivial and may possibly produce any kind of incompatibility issues with QEMU. Emulating just MCH in Xen while still leaving anything else to QEMU does not show an obvious advantage. Without extending the chipset emulation in Xen further, it will be just an overcomplicated emulation of PCIEXBAR register. If this will be the only first objective for the feature, then we need some strong justification why moving the emulation of guest's PCIEXBAR from QEMU to Xen is a mandatory thing. We need to be extra sure that having MCH emulated in Xen while ICH9 and all the rest will remain to be emulated by QEMU is a good solution for PCIEXBAR emulation. Otherwise, having a split-type chipset emulation between Xen/QEMU just to handle the Q35' PCIEXBAR register is an overkill. I would personally prefer to implement the option 1 first, while researching and implementing the option 2 in the near perspective. There is nothing special in PCIEXBAR, it's just one of the emulated chipset registers, holding the address of the emulated MMIO area. This register doesn't differ much with eg. AHCI ABAR. In fact, it's actually more harmless -- for MMCONFIG MMIO we merely forward accesses for PCI config read/write emulation (same thing as for emulated CF8/CFC I/O), while handling AHCI ABAR MMIO means that we do serious things like initiating real block I/O with the host. For PT devices MMCONFIG accesses still go thru hw/xen-pt*.c for filtering or emulation. >> It is somewhat related to the chipset because memory/MMIO layout >> inconsistency can be solved more, well, naturally on Q35. >> >> Basically, we have a non-standard MMIO hole layout where the >> start of the high MMIO hole do not match the top of addressable RAM >> (due to invisible ranges of the device model). > >But that's a device model issue then? I'm not sure I'm getting what >you mean here. We depend on the device model in the question where we can place the start of the high MMIO hole currently. This also badly affects memory relocation support, which is required for MMIO hole auto-sizing. There are multiple options how to resolve this problem, eg. placing VRAM to some addresses far beyond >4Gb but this approach is not ideal too as the device model cannot know where 64-bit BARs will be allocated. Although this is a simplest approach to avoid overlaps and to have the high MMIO hole base equal to the max guest RAM address. >> Q35 initially have facilities to allow firmware to modify (via >> emulation) or discover such MMIO hole setup which can be used for >> safe MMIO BAR allocation to avoid overlaps with QEMU-owned invisible >> ranges. > >IMO a single entity should be in control of the memory layout, and >that's the toolstack. > >Ideally we should not allow the firmware to change the layout at all. This approach is terribly wrong, I don't know why opinions like this so common at Citrix. The toolstack is a least informed side. If MMIO/memory layout should be immutable, it must be calculated considering all factors, like chipset-specific MMIO ranges or ranges which cannot be used for the MMIO hole. We need to know all resource requirements of device-model's and PT PCI devices, all chipset-specific MMIO ranges (which belong to a device model), all RMRRs (host's property) and all device-model invisible ranges like VRAM backing store (another device model's property). And we need to know in which manner hvmloader will be allocating BARs to the MMIO hole -- eg. either in a forward direction starting from some base or moving backwards from the end of 4Gb (minus hardcoded ranges). Basically this means that we have to depend on hvmloader code/version too in the toolstack, which is wrong on its own -- we should have a freedom to modify the BAR allocation algo in hvmloader at any time. At the moment all this information can be discovered only from the firmware side. Lot of changes needed to gather all required information from the toolstack. >What are specifically the registers that you mention? Write-once emulation of TOLUD/TOUUD/REMAPBASE/REMAPLIMIT registers for hvmloader to use. That's the approach I'm actually using to make 'hvmloader/allow-memory-relocate=1' to work. Memory relocation without relying on add_to_physmap hypercall for hvmloader (which it does currently) while having MMIO/memory layout synchronized between all parties. There are multiple benefits (mostly for PT needs), including the MMIO hole auto-sizing support but this approach won't be accepted well with "toolstack should do everything" attitude I'm afraid. >> It doesn't really matter which registers to pick for this task, but >> for Q35 this approach is at least consistent with what a real system >> does (PV/PVH people will find this peculiarity pointless I >> suppose :) ). >Right, but I don't think we aim to emulate a fully complete Q35 MCH or >ICH9 for example, which has tons of registers, not even QEMU is trying >to do that. The main goal is to emulate the registers we know are >required for OSes to work. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |