[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks



>>> On 05.02.16 at 14:42, <andrew.cooper3@xxxxxxxxxx> wrote:
> @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
> unsigned int *ebx,
>          /* Fix up VLAPIC details. */
>          *ebx &= 0x00FFFFFFu;
>          *ebx |= (v->vcpu_id * 2) << 24;
> +
> +        *ecx &= hvm_featureset[FEATURESET_1c];
> +        *edx &= hvm_featureset[FEATURESET_1d];

Looks like I've overlooked an issue in patch 11, which becomes
apparent here: How can you use a domain-independent featureset
here, when features vary between HAP and shadow mode guests?
I.e. in the earlier patch I suppose you need to calculate two
hvm_*_featureset[]s, with the HAP one perhaps empty when
!hvm_funcs.hap_supported.

> @@ -4694,6 +4687,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
> unsigned int *ebx,
>          break;
>  
>      case 0x80000001:
> +        *ecx &= hvm_featureset[FEATURESET_e1c];
> +        *edx &= hvm_featureset[FEATURESET_e1d];

Looking at the uses here, wouldn't it be better (less ambiguous) for
these to be named FEATURESET_x1c and FEATURESET_x1d
respectively, since x is not a hex digit, but e is?

> @@ -841,69 +842,70 @@ void pv_cpuid(struct cpu_user_regs *regs)
>      else
>          cpuid_count(leaf, subleaf, &a, &b, &c, &d);
>  
> -    if ( (leaf & 0x7fffffff) == 0x00000001 )
> -    {
> -        /* Modify Feature Information. */
> -        if ( !cpu_has_apic )
> -            __clear_bit(X86_FEATURE_APIC, &d);
> -
> -        if ( !is_pvh_domain(currd) )
> -        {
> -            __clear_bit(X86_FEATURE_PSE, &d);
> -            __clear_bit(X86_FEATURE_PGE, &d);
> -            __clear_bit(X86_FEATURE_PSE36, &d);
> -            __clear_bit(X86_FEATURE_VME, &d);
> -        }
> -    }
> -
>      switch ( leaf )
>      {
>      case 0x00000001:
> -        /* Modify Feature Information. */
> -        if ( !cpu_has_sep )
> -            __clear_bit(X86_FEATURE_SEP, &d);
> -        __clear_bit(X86_FEATURE_DS, &d);
> -        __clear_bit(X86_FEATURE_ACC, &d);
> -        __clear_bit(X86_FEATURE_PBE, &d);
> -        if ( is_pvh_domain(currd) )
> -            __clear_bit(X86_FEATURE_MTRR, &d);
> -
> -        __clear_bit(X86_FEATURE_DTES64 % 32, &c);
> -        __clear_bit(X86_FEATURE_MWAIT % 32, &c);
> -        __clear_bit(X86_FEATURE_DSCPL % 32, &c);
> -        __clear_bit(X86_FEATURE_VMXE % 32, &c);
> -        __clear_bit(X86_FEATURE_SMXE % 32, &c);
> -        __clear_bit(X86_FEATURE_TM2 % 32, &c);
> +        c &= pv_featureset[FEATURESET_1c];
> +        d &= pv_featureset[FEATURESET_1d];
> +
>          if ( is_pv_32bit_domain(currd) )
> -            __clear_bit(X86_FEATURE_CX16 % 32, &c);
> -        __clear_bit(X86_FEATURE_XTPR % 32, &c);
> -        __clear_bit(X86_FEATURE_PDCM % 32, &c);
> -        __clear_bit(X86_FEATURE_PCID % 32, &c);
> -        __clear_bit(X86_FEATURE_DCA % 32, &c);
> -        if ( !cpu_has_xsave )
> -        {
> -            __clear_bit(X86_FEATURE_XSAVE % 32, &c);
> -            __clear_bit(X86_FEATURE_AVX % 32, &c);
> -        }
> -        if ( !cpu_has_apic )
> -           __clear_bit(X86_FEATURE_X2APIC % 32, &c);
> -        __set_bit(X86_FEATURE_HYPERVISOR % 32, &c);
> +            c &= ~cpufeat_mask(X86_FEATURE_CX16);

Shouldn't this be taken care of by clearing LM and then applying
your dependencies correction? Or is that meant to only get
enforced later? Is it maybe still worth having both pv64_featureset[]
and pv32_featureset[]?

> +        /*
> +         * !!! Warning - OSXSAVE handling for PV guests is non-architectural 
> !!!
> +         *
> +         * Architecturally, the correct code here is simply:
> +         *
> +         *   if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE )
> +         *       c |= cpufeat_mask(X86_FEATURE_OSXSAVE);
> +         *
> +         * However because of bugs in Xen (before c/s bd19080b, Nov 2010, the
> +         * XSAVE cpuid flag leaked into guests despite the feature not being
> +         * avilable for use), buggy workarounds where introduced to Linux 
> (c/s
> +         * 947ccf9c, also Nov 2010) which relied on the fact that Xen also
> +         * incorrectly leaked OSXSAVE into the guest.
> +         *
> +         * Furthermore, providing architectural OSXSAVE behaviour to a many
> +         * Linux PV guests triggered a further kernel bug when the fpu code
> +         * observes that XSAVEOPT is available, assumes that xsave state had
> +         * been set up for the task, and follows a wild pointer.
> +         *
> +         * Therefore, the leaking of Xen's OSXSAVE setting has become a
> +         * defacto part of the PV ABI and can't reasonably be corrected.
> +         *
> +         * The following situations and logic now applies:
> +         *
> +         * - Hardware without CPUID faulting support and native CPUID:
> +         *    There is nothing Xen can do here.  The hosts XSAVE flag will
> +         *    leak through and Xen's OSXSAVE choice will leak through.
> +         *
> +         *    In the case that the guest kernel has not set up OSXSAVE, only
> +         *    SSE will be set in xcr0, and guest userspace can't do too much
> +         *    damage itself.
> +         *
> +         * - Enlightened CPUID or CPUID faulting available:
> +         *    Xen can fully control what is seen here.  Guest kernels need to
> +         *    see the leaked OSXSAVE, but guest userspace is given
> +         *    architectural behaviour, to reflect the guest kernels
> +         *    intentions.
> +         */

Well, I think all of this is too harsh: In a hypervisor-guest
relationship of PV kind I don't view it as entirely wrong to let the
guest kernel know of whether the hypervisor enabled XSAVE
support by inspecting the OSXSAVE bit. From guest kernel
perspective, the hypervisor is kind of the OS. While I won't
make weakening the above a little a requirement, I'd appreciate
you doing so.

> +    case 0x80000007:
> +        d &= pv_featureset[FEATURESET_e7d];
> +        break;

By not clearing eax and ebx (not sure about ecx) here you would
again expose flags to guests without proper white listing.

> +    case 0x80000008:
> +        b &= pv_featureset[FEATURESET_e8b];
>          break;

Same here for ecx and edx and perhaps the upper 8 bits of eax.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.