Xen project Mailing List

Re: [Xen-devel] [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

From: Roger Pau MonnÃ <roger.pau@xxxxxxxxxx>

Date: Fri, 22 Jan 2016 15:59:37 +0100

Delivery-date: Fri, 22 Jan 2016 15:00:07 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

El 22/01/16 a les 14.34, Andrew Cooper ha escrit: > On 22/01/16 12:43, Roger Pau Monné wrote: >> El 22/01/16 a les 11.57, Jan Beulich ha escrit: >>>>>> On 21.01.16 at 17:51, <roger.pau@xxxxxxxxxx> wrote: >>>> Add a new HVM-specific feature flag that signals the presence of a bitmap >>>> that contains the current set of enabled emulated devices. The bitmap is >>>> placed in the ecx register. The bit fields used in the bitmap are the same >>>> as the ones used in the xen_arch_domainconfig emulation_flags field, and >>>> their meaning can be found at arch-x86/xen.h. >>>> >>>> This will allow Xen to enable emulated devices for HVMlite guests in the >>>> future, by having a proper ABI for reporting which devices are enabled. >>> The idea is certainly nice and appreciated, but ... >>> >>>> --- a/xen/include/public/arch-x86/cpuid.h >>>> +++ b/xen/include/public/arch-x86/cpuid.h >>>> @@ -78,12 +78,17 @@ >>>> * HVM-specific features >>>> * EAX: Features >>>> * EBX: vcpu id (iff EAX has XEN_HVM_CPUID_VCPU_ID_PRESENT flag) >>>> + * ECX: bitmap of enabled devices, according to the bit fields defined in >>>> + * arch-x86/xen.h. >>> ... this set of definitions is not currently a stable ABI (limited to >>> hypervisor and tool stack), and if we wanted to make it stable >>> we'd first need to think a little about the complications that may >>> arise if the granularity chosen (think about the PM bit and the >>> discussion around it before your changes went in) turns out to >>> be a problem later on. >> Yes, in fact I'm having second thoughts on the PM flag, and I think I >> should have split it into ACPI_PM and ACPI_TIMER instead. >> >>> Also at least some of the features can be determined by other >>> means (CPUID, ACPI tables), so I'm not even sure we need all >>> of this, and I'd really prefer to avoid multiple distinct ways to >>> learn of a certain feature, as it's too easy for the two (or more) >>> mechanisms to get out of sync. >> So let's look at the flags and whether there's an existing way to signal >> it's presence: >> >> LAPIC: CPUID.01h:EDX[bit 9] >> IOAPIC: tied to LAPIC (so either both enabled or none). > > An IOAPIC is by no means required - they are only for turning legacy > interrupts into MSIs. It would be perfectly fine for a PVH domain to > have an LAPIC and an SRIOV virtual function, without an IOAPIC at all. > > The presence of LAPICs and IOAPICs reside in the MADT ACPI table. Right, so as said in the reply to Jan, we will require ACPI in order to enable any of this pieces. I don't have a problem with that, just wasn't sure if this requirement was desired. If that's the plan, then I think we would also need to fixup the tables provided to Dom0 in order to match what's available, but that can be discussed later. > Note also that the cpuid bit is a fastforward of the hardware enable bit > in the APIC_BASE MSR. The cpuid bit will disappear from view if you > hardware-disable the LAPIC. Right, it looks like ACPI is the best way to decide. >> >> HPET: can only be enabled from/with ACPI, since it's base memory address >> is not fixed, and we would need to find a way to pass it's address to >> the OS in the absence of ACPI. > > In reality, there are heuristics to guess if an HPET is present. The > legacy HPET traditionally always resides at pfn fed000. Linux even has > heuristics to find the legacy HPET based on the IOH, for when the BIOS > doesn't present the HPET properly in ACPI. Heh, if we already require ACPI in order to discover local APICs and IO APICs, I don't think it hurts to also require it on order to discover HPET. >> RTC: I don't know of any way to signal the RTC presence, AFAICT it's >> always assumed to be there in the PC architecture. Could maybe return ~0 >> when reading from IO port 0x71, but that's meh..., not the best way IMHO. >> >> PIC: same as RTC, I don't know of any way to signal it's presence since >> it's assumed to be there. >> >> VGA: again I don't think there's an easy way to signal it's presence, >> apart from returning ~0 from the multiple IO ports it uses. The fact >> that the 0xA0000-0xBFFFF memory range is also marked as RAM in the e820 >> map in HVMlite DomUs should also trigger OSes into disabling VGA due to >> the lack of proper MMIO range, but sadly I think most OSes just assume >> it's there. > > VGA can be found by following the VGA routing bit in PCI config space. > This is how real hardware makes the legacy IO ranges reach the graphics > card configured as the primary vga device. Hm, I have to look into this, are there any examples of this mechanism out there? >> >> PIT: assumed to be always present in the PC architecture. > > PIT, RTC and PIC have their presence always assumed, but returning ~0 on > reads is completely fine. A DMLite OS knows it is booting in a > virtualised environment. Yes, that's fine, I'm completely disabling the attachment of those devices when entering from the Xen entry point ATM on FreeBSD, but how are we going to notify the OS when they are actually available? Just by returning something different from ~0 when poking at their IO ports? Doesn't look like a very robust way IMHO. > >> >> PM: I'm leaning to split this into ACPI_PM and ACPI_TIMER as said >> before. ACPI_TIMER presence it's contained inside of ACPI tables, and >> the availability of ACPI_PM (power management) can be inferred from the >> presence of ACPI itself. >> >> AMD guest IOMMU: AFAICT this seems to be currently disabled, since the >> MMIO range it checks is [~0ULL, ~0ULL + 0x8000]. There is a function to >> change the base address ~0ULL to something else, but it doesn't seem to >> be reachable from any path. In any case, I guess the presence of this >> device will be reported from ACPI. > > It is indeed currently disabled (See > https://bugs.xenserver.org/browse/XSO-132 if you want to see why. It > manifested as a very curious bug). > > It will be available via an IVRS ACPI table when implemented. > >> >> So, we have the following devices that are assumed to be there: RTC, >> PIC, PIT. Everything else I think can be signalled by other means >> already available. >> >> IMHO, I think we could say that the PIC is never going to be available >> to HVMlite guests (in any case we would enable the lapic/ioapic), and >> maybe enable the RTC and PIT by default? >> >> Then I think we could get away without any Xen-specific way of reporting >> enabled devices. > > DMLite is a new container type. I would far rather it was assumed that > there was no legacy hardware at all. So I take that you are in favour of only considering enabling the local APIC and IO APIC maybe for HVMlite, because of the performance benefits, while the other devices are _never_ going to be available to HVMlite guests/hosts at all? (Dom0 already gets the hw VGA) IMHO, I would like to be able to eventually enable them in order to provide an environment that's as close as possible to a compatible PC, in order to reduce the amount of changes required in order to port an OS to run in this mode. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.