[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 7/8] xen/x86: Rework CR4 handling for PV guests



On 25/06/15 15:41, Jan Beulich wrote:
>>>> On 24.06.15 at 18:31, <andrew.cooper3@xxxxxxxxxx> wrote:
>> PV CR4 settings are now based on mmu_cr4_features, rather than the current
>> contents of CR4.  This causes dom0 to be consistent with domUs, despite 
>> being
>> constructed in a region with CR4.SMAP purposefully disabled.
> That'll be fine as long as we're keeping CR4 stable (except for
> guest controlled bits). Agreed, once that's no longer the case,
> mmu_cr4_features would no longer be very useful either.
> Still it would seem more reasonable to me to base things on
> current values than on some not guaranteed to be up to date.

mmu_cr4_features is supposed to be the runtime value in cr4.

At any point, SMAP/SMEP/PGE might be temporarily disabled, which is why
I avoided using the current value of cr4.

>
>> The major change here is that v->arch.pv_vcpu.ctrlreg[4] now contains only 
>> CR4
>> bits which Xen wishes to shadow, rather than containing a mix of host and
>> guest bits.  This is likely to elicit a warning when migrating a PV guest 
>> from
>> an older version of Xen, although no function problems as a result.
> For one, I'm having a hard time guessing what the part of the sentence
> after the comma is meant to tell me.

Before this patch, v->arch.pv_vcpu.ctrlreg[4] contained an unqualified
combination of host and guest controlled bits.

After this patch, it strictly contains the bits Xen chooses to shadow.

> And then, from a support
> perspective, such warnings aren't helpful, as they incur questions.

It was useful for debugging, but can see your point.  I could reword it
to "ignoring $FOO's attempt to update $BAR's CR4".  I would hesitate
from removing it completely however.

> Plus
> finally, storing neither the current guest view nor the current hypervisor
> view in that field seems prone to cause headaches in the future. Otoh
> this gets it in line with CR0 handling as it looks.

My logic was that the set of shadowed bits will never decrease on newer
versions of Xen, but Xen's choice of CR4 bits may change.  As a result,
neither "guests view" nor "hypervisors view" is appropriate in an
upgrade case.

>
>> A second change is that visible set of CR4 bits is different.  In 
>> particular,
>> VMXE and MCE are not leaked into a guest, while SMAP and SMEP are exposed.
> Not leaking VMXE into the guest is fine. Not exposing MCE even to
> Dom0 is questionable.

I would prefer not to special case the hardware domain if possible, and
I suppose MCE is not too much of an issue to leak.  We already are,
after all.

> And exposing SMEP and SMAP without
> exposing their CPUID flags seems bogus.

I was hoping not to get bogged down in CPUID given my upcoming work to
fix feature levelling.

After recent consideration, I am still not sure if we want to support
SMAP in 32bit PV guests or not.  The trapping of stac/clac would be a
high overhead, although the guest could modify EFLAGS.AC itself using popf.

>
>> In addition, add PGE to the set of bits we don't care about a guest 
>> attempting to modify.
> I agree from a functionality pov this is okay, but do we really want
> to allow the guest to turn this off from a performance pov?

This change doesn't shadow PGE.  It just prevents the warning from
firing if the guest attempts to clear PGE.

>
>> @@ -683,6 +683,26 @@ void arch_domain_unpause(struct domain *d)
>>  }
>>  
>>  /*
>> + * CR4 bits Xen will shadow on behalf of the guest.
>> + *  - TSD for vtsc
>> + *  - DE for %%dr4/5 emulation
> DR4/5 handling is surely secondary here: I/O breakpoints are the
> main feature to be controlled by this bit.

True - I will expand the comment.

>
>> @@ -699,6 +719,8 @@ static int __init init_pv_cr4_masks(void)
>>          common_mask &= ~X86_CR4_DE;
>>      if ( cpu_has_xsave )
>>          common_mask &= ~X86_CR4_OSXSAVE;
>> +    if ( cpu_has_pge )
>> +        common_mask &= ~X86_CR4_PGE;
> Following the earlier comment, shouldn't this bit then also be added
> to PV_CR4_SHADOW?

I don't think so.  A guest has to defer to Xen to perform TLB flushes. 
I just wished to avoid the warning.

>
>> @@ -708,21 +730,61 @@ static int __init init_pv_cr4_masks(void)
>>      if ( cpu_has_fsgsbase )
>>          pv_cr4_mask &= ~X86_CR4_FSGSBASE;
>>  
>> +    BUILD_BUG_ON(PV_CR4_SHADOW & PV_CR4_READ);
>> +
>>      return 0;
>>  }
>>  __initcall(init_pv_cr4_masks);
>>  
>> -unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long 
>> guest_cr4)
>> +void pv_cr4_write(struct vcpu *v, unsigned long guest_cr4)
>>  {
>> -    unsigned long hv_cr4 = real_cr4_to_pv_guest_cr4(read_cr4());
>> +    unsigned long host_cr4 = mmu_cr4_features & (PV_CR4_READ | 
>> PV_CR4_SHADOW);
>>      unsigned long mask = is_pv_32bit_vcpu(v) ? compat_pv_cr4_mask : 
>> pv_cr4_mask;
>>  
>> -    if ( (guest_cr4 & mask) != (hv_cr4 & mask) )
>> +    if ( guest_cr4 == 0 )
>> +    {
>> +        /* Default - all shadowed bits to 0. */
>> +        guest_cr4 = mmu_cr4_features & ~PV_CR4_SHADOW;
>> +    }
>> +    else if ( (guest_cr4 & ~PV_CR4_SHADOW) == 0 )
>> +    {
>> +        /*
>> +         * Only setting shadowed bits.  This will be the case on restore, so
>> +         * avoid warning about losing readable host bits.
>> +         */
>> +    }
>> +    else if ( (guest_cr4 & mask) != (host_cr4 & mask) )
>> +    {
>>          printk(XENLOG_G_WARNING
>> -               "d%d attempted to change %pv's CR4 flags %08lx -> %08lx\n",
>> -               current->domain->domain_id, v, hv_cr4, guest_cr4);
>> +               "d%d attempted to change %pv's CR4 flags %08lx -> %08lx "
>> +               "(bad +%#lx, -%#lx)\n",
> No mixture of 0x-prefixed and not-0x-prefixed hex numbers please.

Oops yes.

> I anyway wonder whether printing four numbers here is necessary
> when printing just three (host_cr4, guest_cr4, and mask) would
> suffice.

It is substantially harder to read.

>
>> +               current->domain->domain_id, v, host_cr4, guest_cr4,
>> +               (host_cr4 ^ guest_cr4) & guest_cr4 & mask,
>> +               (host_cr4 ^ guest_cr4) & ~guest_cr4 & mask);
>> +    }
>> +
>> +    v->arch.pv_vcpu.ctrlreg[4] =
>> +        ((mmu_cr4_features & mask) | (guest_cr4 & ~mask)) & PV_CR4_SHADOW;
> With FSGSBASE not in PV_CR4_SHADOW, how would (as the comment
> in init_pv_cr4_masks() says) the guest manage to observe the flag in
> its desired state after a migration?

The guest does not get any control of FSGSBASE.  Xen unilaterally has it
enabled, and uses them on vcpu context switch so it is not safe to ever
disable.

>
>> +unsigned long pv_cr4_read(const struct vcpu *v)
>> +{
>> +    unsigned long cr4 = read_cr4();
>> +
>> +    return (cr4 & PV_CR4_READ & ~PV_CR4_SHADOW) |
>> +        (v->arch.pv_vcpu.ctrlreg[4] & PV_CR4_SHADOW);
> I don't understand either of the two uses of PV_CR4_SHADOW here:
> The first seems pointless as earlier an you already assert that it
> and PV_CR4_READ are disjoint, i.e. the negation of the former
> surely has no less bits set than the latter. And the second seems
> pointless since you only ever store bits in PV_CR4_SHADOW into the
> field.

This bit of code has failed to change over the development of the
series.  I will adjust it.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.