[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 7/8] xen/x86: Rework CR4 handling for PV guests



>>> On 24.06.15 at 18:31, <andrew.cooper3@xxxxxxxxxx> wrote:
> PV CR4 settings are now based on mmu_cr4_features, rather than the current
> contents of CR4.  This causes dom0 to be consistent with domUs, despite 
> being
> constructed in a region with CR4.SMAP purposefully disabled.

That'll be fine as long as we're keeping CR4 stable (except for
guest controlled bits). Agreed, once that's no longer the case,
mmu_cr4_features would no longer be very useful either.
Still it would seem more reasonable to me to base things on
current values than on some not guaranteed to be up to date.

> The major change here is that v->arch.pv_vcpu.ctrlreg[4] now contains only CR4
> bits which Xen wishes to shadow, rather than containing a mix of host and
> guest bits.  This is likely to elicit a warning when migrating a PV guest 
> from
> an older version of Xen, although no function problems as a result.

For one, I'm having a hard time guessing what the part of the sentence
after the comma is meant to tell me. And then, from a support
perspective, such warnings aren't helpful, as they incur questions. Plus
finally, storing neither the current guest view nor the current hypervisor
view in that field seems prone to cause headaches in the future. Otoh
this gets it in line with CR0 handling as it looks.

> A second change is that visible set of CR4 bits is different.  In 
> particular,
> VMXE and MCE are not leaked into a guest, while SMAP and SMEP are exposed.

Not leaking VMXE into the guest is fine. Not exposing MCE even to
Dom0 is questionable. And exposing SMEP and SMAP without
exposing their CPUID flags seems bogus.

> In addition, add PGE to the set of bits we don't care about a guest 
> attempting to modify.

I agree from a functionality pov this is okay, but do we really want
to allow the guest to turn this off from a performance pov?

> @@ -683,6 +683,26 @@ void arch_domain_unpause(struct domain *d)
>  }
>  
>  /*
> + * CR4 bits Xen will shadow on behalf of the guest.
> + *  - TSD for vtsc
> + *  - DE for %%dr4/5 emulation

DR4/5 handling is surely secondary here: I/O breakpoints are the
main feature to be controlled by this bit.

> @@ -699,6 +719,8 @@ static int __init init_pv_cr4_masks(void)
>          common_mask &= ~X86_CR4_DE;
>      if ( cpu_has_xsave )
>          common_mask &= ~X86_CR4_OSXSAVE;
> +    if ( cpu_has_pge )
> +        common_mask &= ~X86_CR4_PGE;

Following the earlier comment, shouldn't this bit then also be added
to PV_CR4_SHADOW?

> @@ -708,21 +730,61 @@ static int __init init_pv_cr4_masks(void)
>      if ( cpu_has_fsgsbase )
>          pv_cr4_mask &= ~X86_CR4_FSGSBASE;
>  
> +    BUILD_BUG_ON(PV_CR4_SHADOW & PV_CR4_READ);
> +
>      return 0;
>  }
>  __initcall(init_pv_cr4_masks);
>  
> -unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long 
> guest_cr4)
> +void pv_cr4_write(struct vcpu *v, unsigned long guest_cr4)
>  {
> -    unsigned long hv_cr4 = real_cr4_to_pv_guest_cr4(read_cr4());
> +    unsigned long host_cr4 = mmu_cr4_features & (PV_CR4_READ | 
> PV_CR4_SHADOW);
>      unsigned long mask = is_pv_32bit_vcpu(v) ? compat_pv_cr4_mask : 
> pv_cr4_mask;
>  
> -    if ( (guest_cr4 & mask) != (hv_cr4 & mask) )
> +    if ( guest_cr4 == 0 )
> +    {
> +        /* Default - all shadowed bits to 0. */
> +        guest_cr4 = mmu_cr4_features & ~PV_CR4_SHADOW;
> +    }
> +    else if ( (guest_cr4 & ~PV_CR4_SHADOW) == 0 )
> +    {
> +        /*
> +         * Only setting shadowed bits.  This will be the case on restore, so
> +         * avoid warning about losing readable host bits.
> +         */
> +    }
> +    else if ( (guest_cr4 & mask) != (host_cr4 & mask) )
> +    {
>          printk(XENLOG_G_WARNING
> -               "d%d attempted to change %pv's CR4 flags %08lx -> %08lx\n",
> -               current->domain->domain_id, v, hv_cr4, guest_cr4);
> +               "d%d attempted to change %pv's CR4 flags %08lx -> %08lx "
> +               "(bad +%#lx, -%#lx)\n",

No mixture of 0x-prefixed and not-0x-prefixed hex numbers please.
I anyway wonder whether printing four numbers here is necessary
when printing just three (host_cr4, guest_cr4, and mask) would
suffice.

> +               current->domain->domain_id, v, host_cr4, guest_cr4,
> +               (host_cr4 ^ guest_cr4) & guest_cr4 & mask,
> +               (host_cr4 ^ guest_cr4) & ~guest_cr4 & mask);
> +    }
> +
> +    v->arch.pv_vcpu.ctrlreg[4] =
> +        ((mmu_cr4_features & mask) | (guest_cr4 & ~mask)) & PV_CR4_SHADOW;

With FSGSBASE not in PV_CR4_SHADOW, how would (as the comment
in init_pv_cr4_masks() says) the guest manage to observe the flag in
its desired state after a migration?

> +unsigned long pv_cr4_read(const struct vcpu *v)
> +{
> +    unsigned long cr4 = read_cr4();
> +
> +    return (cr4 & PV_CR4_READ & ~PV_CR4_SHADOW) |
> +        (v->arch.pv_vcpu.ctrlreg[4] & PV_CR4_SHADOW);

I don't understand either of the two uses of PV_CR4_SHADOW here:
The first seems pointless as earlier an you already assert that it
and PV_CR4_READ are disjoint, i.e. the negation of the former
surely has no less bits set than the latter. And the second seems
pointless since you only ever store bits in PV_CR4_SHADOW into the
field.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.