[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] x86/amd: Avoid cpu_has_hypervisor evaluating true on native hardware



On 29.01.2020 14:08, Andrew Cooper wrote:
> On 29/01/2020 08:17, Jan Beulich wrote:
>> On 28.01.2020 18:14, Andrew Cooper wrote:
>>> On 28/01/2020 13:59, Jan Beulich wrote:
>>>> On 27.01.2020 21:21, Andrew Cooper wrote:
>>>>> Without this fix, there is apparently a problem with Roger's "[PATCH v3 
>>>>> 7/7]
>>>>> x86/tlb: use Xen L0 assisted TLB flush when available" on native AMD 
>>>>> hardware.
>>>>> I haven't investgiated the issue with that patch specifically, because
>>>>> cpu_has_hypervisor being wrong is obviously a bug.
>>>>>
>>>>> This is one of two possible approaches, and both have their downsides.  
>>>>> This
>>>>> one takes an extra hit on context switches between PV vcpus and idle/hvm, 
>>>>> as
>>>>> they will usually differ in HYPERVISOR bit.
>>>> Why would they differ in the HYPERVISOR bit? Maybe for idle (albeit
>>>> off the top of my head I can't recall us special casing idle wrt
>>>> CPUID handling), but why for PV vs HVM? The idle case, if there is
>>>> an issue with this, could be taken care of by actually setting the
>>>> bit there, as no-one should care about what it's set to?
>>> d->arch.pv.cpuidmasks is only allocated for PV domains (and starts by
>>> dup()ing the default).
>> Ah, that's the piece I was missing. My next question then is: Why
>> do we do *_init_levelling() from early_init_*() in the first place?
>> It looks conceptually wrong to me to set up leveling before having
>> obtained CPUID data. Wouldn't there better be a separate hook in
>> struct cpu_dev, to be invoked e.g. from identify_cpu() after
>> generic_identify()?
> 
> cpuid_mask_cpu= literally means "pretend you're this older CPU instead",
> and was implemented in a way which affected Xen.  This is why levelling
> is configured that early.

It's indeed written like this in the cmdline doc, but I vaguely
recall questioning this behavior at least once before.

> Now that this isn't the only way to make heterogeneous migration safe,
> perhaps we don't care so much, but it would still be a behavioural
> change to the cpuid_mask_* parameters.

And indeed we have cpuid= now to control the features Xen is
to use. I don't fancy looking at bug reports where I first need
to sort out the interaction between these two cmdline options.

>>> When context switching levelling MSRs, any non-PV guest uses
>>> cpumask_default.  This captures idle and HVM vcpus.
>>>
>>> This is necessary because, at least at the time it was introduced,
>>> {pv,hvm}_cpuid() issued native CPUID instructions to then feed data back
>>> into guest context.  Its probably less relevant now that guest_cpuid()
>>> doesn't issue native instructions in the general case.
>>>
>>> Either way, HVM gained the default like idle, to cause the lazy
>>> switching logic to switch less often.
>>>
>>> The problem we have after this patch is that default differs from PV in
>>> the HYPERVISOR bit, which basically guarantees that we rewrite the leaf
>>> 1 levelling on each context switch.
>>>
>>> However, having looked at the other features bits which differ for PV,
>>> VME and PSE36 being hidden means we're always switching leaf 1 anyway,
>>> so this change for HYPERVISOR doesn't make the situation any worse.
>>>
>>>>> The other approach is to order things more carefully so levelling is
>>>>> configured after scanning for cpuid bits, but that has the downside that 
>>>>> it is
>>>>> very easy to regress.
>>>>>
>>>>> Thoughts on which is the least-bad approach to take?  Having written this
>>>>> patch, I'm now erring on the side of doing it the other way.
>>>> Besides the need for me to understand the aspect above, I'm afraid
>>>> to judge I'd need to have at least a sketch of what the alternative
>>>> would look like, in particular to figure how fragile it really is.
>>> It would be a small bit of careful reordering in cpu/amd.c
>>>
>>> The tipping factor is that, even if we arrange for idle context not to
>>> have HYPERVISOR set (and therefore cpu_has_hypervisor ending up clear
>>> when scanned), a regular CPUID instruction in PV context would see
>>> HYPERVISOR as a property of virtualising things sensibly for guests.
>>>
>>> As we need to cope with HYPERVISOR being visible in some contexts, its
>>> better to consider it uniformly visible and break any kind of notional
>>> link between cpu_has_hypervisor matching what CPUID would see as the bit.
>> After having set up leveling I think we shouldn't use CPUID anymore
>> for leaves which may be leveled. As a result cpu_has_* / cpu_has()
>> would then be the only information source in this regard.
>>
>> Such a general re-arrangement would then also appear to address your
>> "easy to regress" concern.
> 
> I've just had another thought, and it rules out other approaches.
> 
> We use ctxt_switch_levelling(NULL) on the crash path to reset things for
> next kernel, and that needs to hide the HYPERVISOR bit on AMD.
> 
> Therefore, the approach in this patch is the only sensible action (and
> I'm now not concerned about the performance hit, as it won't actually
> increase the number of MSR writes we make).

I'm not fully convinced: Why is setting the MSRs back to
cpuidmask_defaults the correct thing to do in the first place?
Shouldn't we reset them to what we found on boot?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.