[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] x86/amd: Avoid cpu_has_hypervisor evaluating true on native hardware



On 28.01.2020 18:14, Andrew Cooper wrote:
> On 28/01/2020 13:59, Jan Beulich wrote:
>> On 27.01.2020 21:21, Andrew Cooper wrote:
>>> Without this fix, there is apparently a problem with Roger's "[PATCH v3 7/7]
>>> x86/tlb: use Xen L0 assisted TLB flush when available" on native AMD 
>>> hardware.
>>> I haven't investgiated the issue with that patch specifically, because
>>> cpu_has_hypervisor being wrong is obviously a bug.
>>>
>>> This is one of two possible approaches, and both have their downsides.  This
>>> one takes an extra hit on context switches between PV vcpus and idle/hvm, as
>>> they will usually differ in HYPERVISOR bit.
>> Why would they differ in the HYPERVISOR bit? Maybe for idle (albeit
>> off the top of my head I can't recall us special casing idle wrt
>> CPUID handling), but why for PV vs HVM? The idle case, if there is
>> an issue with this, could be taken care of by actually setting the
>> bit there, as no-one should care about what it's set to?
> 
> d->arch.pv.cpuidmasks is only allocated for PV domains (and starts by
> dup()ing the default).

Ah, that's the piece I was missing. My next question then is: Why
do we do *_init_levelling() from early_init_*() in the first place?
It looks conceptually wrong to me to set up leveling before having
obtained CPUID data. Wouldn't there better be a separate hook in
struct cpu_dev, to be invoked e.g. from identify_cpu() after
generic_identify()?

> When context switching levelling MSRs, any non-PV guest uses
> cpumask_default.  This captures idle and HVM vcpus.
> 
> This is necessary because, at least at the time it was introduced,
> {pv,hvm}_cpuid() issued native CPUID instructions to then feed data back
> into guest context.  Its probably less relevant now that guest_cpuid()
> doesn't issue native instructions in the general case.
> 
> Either way, HVM gained the default like idle, to cause the lazy
> switching logic to switch less often.
> 
> The problem we have after this patch is that default differs from PV in
> the HYPERVISOR bit, which basically guarantees that we rewrite the leaf
> 1 levelling on each context switch.
> 
> However, having looked at the other features bits which differ for PV,
> VME and PSE36 being hidden means we're always switching leaf 1 anyway,
> so this change for HYPERVISOR doesn't make the situation any worse.
> 
>>> The other approach is to order things more carefully so levelling is
>>> configured after scanning for cpuid bits, but that has the downside that it 
>>> is
>>> very easy to regress.
>>>
>>> Thoughts on which is the least-bad approach to take?  Having written this
>>> patch, I'm now erring on the side of doing it the other way.
>> Besides the need for me to understand the aspect above, I'm afraid
>> to judge I'd need to have at least a sketch of what the alternative
>> would look like, in particular to figure how fragile it really is.
> 
> It would be a small bit of careful reordering in cpu/amd.c
> 
> The tipping factor is that, even if we arrange for idle context not to
> have HYPERVISOR set (and therefore cpu_has_hypervisor ending up clear
> when scanned), a regular CPUID instruction in PV context would see
> HYPERVISOR as a property of virtualising things sensibly for guests.
> 
> As we need to cope with HYPERVISOR being visible in some contexts, its
> better to consider it uniformly visible and break any kind of notional
> link between cpu_has_hypervisor matching what CPUID would see as the bit.

After having set up leveling I think we shouldn't use CPUID anymore
for leaves which may be leveled. As a result cpu_has_* / cpu_has()
would then be the only information source in this regard.

Such a general re-arrangement would then also appear to address your
"easy to regress" concern.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.