Xen project Mailing List

Re: [Xen-devel] [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks

To: "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>

From: "Jan Beulich" <JBeulich@xxxxxxxx>

Date: Mon, 15 Feb 2016 08:43:37 -0700

Delivery-date: Mon, 15 Feb 2016 15:43:51 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

>>> On 05.02.16 at 14:42, <andrew.cooper3@xxxxxxxxxx> wrote: > @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, > unsigned int *ebx, > /* Fix up VLAPIC details. */ > *ebx &= 0x00FFFFFFu; > *ebx |= (v->vcpu_id * 2) << 24; > + > + *ecx &= hvm_featureset[FEATURESET_1c]; > + *edx &= hvm_featureset[FEATURESET_1d]; Looks like I've overlooked an issue in patch 11, which becomes apparent here: How can you use a domain-independent featureset here, when features vary between HAP and shadow mode guests? I.e. in the earlier patch I suppose you need to calculate two hvm_*_featureset[]s, with the HAP one perhaps empty when !hvm_funcs.hap_supported. > @@ -4694,6 +4687,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, > unsigned int *ebx, > break; > > case 0x80000001: > + *ecx &= hvm_featureset[FEATURESET_e1c]; > + *edx &= hvm_featureset[FEATURESET_e1d]; Looking at the uses here, wouldn't it be better (less ambiguous) for these to be named FEATURESET_x1c and FEATURESET_x1d respectively, since x is not a hex digit, but e is? > @@ -841,69 +842,70 @@ void pv_cpuid(struct cpu_user_regs *regs) > else > cpuid_count(leaf, subleaf, &a, &b, &c, &d); > > - if ( (leaf & 0x7fffffff) == 0x00000001 ) > - { > - /* Modify Feature Information. */ > - if ( !cpu_has_apic ) > - __clear_bit(X86_FEATURE_APIC, &d); > - > - if ( !is_pvh_domain(currd) ) > - { > - __clear_bit(X86_FEATURE_PSE, &d); > - __clear_bit(X86_FEATURE_PGE, &d); > - __clear_bit(X86_FEATURE_PSE36, &d); > - __clear_bit(X86_FEATURE_VME, &d); > - } > - } > - > switch ( leaf ) > { > case 0x00000001: > - /* Modify Feature Information. */ > - if ( !cpu_has_sep ) > - __clear_bit(X86_FEATURE_SEP, &d); > - __clear_bit(X86_FEATURE_DS, &d); > - __clear_bit(X86_FEATURE_ACC, &d); > - __clear_bit(X86_FEATURE_PBE, &d); > - if ( is_pvh_domain(currd) ) > - __clear_bit(X86_FEATURE_MTRR, &d); > - > - __clear_bit(X86_FEATURE_DTES64 % 32, &c); > - __clear_bit(X86_FEATURE_MWAIT % 32, &c); > - __clear_bit(X86_FEATURE_DSCPL % 32, &c); > - __clear_bit(X86_FEATURE_VMXE % 32, &c); > - __clear_bit(X86_FEATURE_SMXE % 32, &c); > - __clear_bit(X86_FEATURE_TM2 % 32, &c); > + c &= pv_featureset[FEATURESET_1c]; > + d &= pv_featureset[FEATURESET_1d]; > + > if ( is_pv_32bit_domain(currd) ) > - __clear_bit(X86_FEATURE_CX16 % 32, &c); > - __clear_bit(X86_FEATURE_XTPR % 32, &c); > - __clear_bit(X86_FEATURE_PDCM % 32, &c); > - __clear_bit(X86_FEATURE_PCID % 32, &c); > - __clear_bit(X86_FEATURE_DCA % 32, &c); > - if ( !cpu_has_xsave ) > - { > - __clear_bit(X86_FEATURE_XSAVE % 32, &c); > - __clear_bit(X86_FEATURE_AVX % 32, &c); > - } > - if ( !cpu_has_apic ) > - __clear_bit(X86_FEATURE_X2APIC % 32, &c); > - __set_bit(X86_FEATURE_HYPERVISOR % 32, &c); > + c &= ~cpufeat_mask(X86_FEATURE_CX16); Shouldn't this be taken care of by clearing LM and then applying your dependencies correction? Or is that meant to only get enforced later? Is it maybe still worth having both pv64_featureset[] and pv32_featureset[]? > + /* > + * !!! Warning - OSXSAVE handling for PV guests is non-architectural > !!! > + * > + * Architecturally, the correct code here is simply: > + * > + * if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE ) > + * c |= cpufeat_mask(X86_FEATURE_OSXSAVE); > + * > + * However because of bugs in Xen (before c/s bd19080b, Nov 2010, the > + * XSAVE cpuid flag leaked into guests despite the feature not being > + * avilable for use), buggy workarounds where introduced to Linux > (c/s > + * 947ccf9c, also Nov 2010) which relied on the fact that Xen also > + * incorrectly leaked OSXSAVE into the guest. > + * > + * Furthermore, providing architectural OSXSAVE behaviour to a many > + * Linux PV guests triggered a further kernel bug when the fpu code > + * observes that XSAVEOPT is available, assumes that xsave state had > + * been set up for the task, and follows a wild pointer. > + * > + * Therefore, the leaking of Xen's OSXSAVE setting has become a > + * defacto part of the PV ABI and can't reasonably be corrected. > + * > + * The following situations and logic now applies: > + * > + * - Hardware without CPUID faulting support and native CPUID: > + * There is nothing Xen can do here. The hosts XSAVE flag will > + * leak through and Xen's OSXSAVE choice will leak through. > + * > + * In the case that the guest kernel has not set up OSXSAVE, only > + * SSE will be set in xcr0, and guest userspace can't do too much > + * damage itself. > + * > + * - Enlightened CPUID or CPUID faulting available: > + * Xen can fully control what is seen here. Guest kernels need to > + * see the leaked OSXSAVE, but guest userspace is given > + * architectural behaviour, to reflect the guest kernels > + * intentions. > + */ Well, I think all of this is too harsh: In a hypervisor-guest relationship of PV kind I don't view it as entirely wrong to let the guest kernel know of whether the hypervisor enabled XSAVE support by inspecting the OSXSAVE bit. From guest kernel perspective, the hypervisor is kind of the OS. While I won't make weakening the above a little a requirement, I'd appreciate you doing so. > + case 0x80000007: > + d &= pv_featureset[FEATURESET_e7d]; > + break; By not clearing eax and ebx (not sure about ecx) here you would again expose flags to guests without proper white listing. > + case 0x80000008: > + b &= pv_featureset[FEATURESET_e8b]; > break; Same here for ecx and edx and perhaps the upper 8 bits of eax. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.