Xen project Mailing List

Re: [Xen-devel] [PATCH RFC] x86/amd: Avoid cpu_has_hypervisor evaluating true on native hardware

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

Date: Wed, 29 Jan 2020 09:17:23 +0100

Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Igor Druzhinin <igor.druzhinin@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>

Delivery-date: Wed, 29 Jan 2020 08:17:34 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 28.01.2020 18:14, Andrew Cooper wrote: > On 28/01/2020 13:59, Jan Beulich wrote: >> On 27.01.2020 21:21, Andrew Cooper wrote: >>> Without this fix, there is apparently a problem with Roger's "[PATCH v3 7/7] >>> x86/tlb: use Xen L0 assisted TLB flush when available" on native AMD >>> hardware. >>> I haven't investgiated the issue with that patch specifically, because >>> cpu_has_hypervisor being wrong is obviously a bug. >>> >>> This is one of two possible approaches, and both have their downsides. This >>> one takes an extra hit on context switches between PV vcpus and idle/hvm, as >>> they will usually differ in HYPERVISOR bit. >> Why would they differ in the HYPERVISOR bit? Maybe for idle (albeit >> off the top of my head I can't recall us special casing idle wrt >> CPUID handling), but why for PV vs HVM? The idle case, if there is >> an issue with this, could be taken care of by actually setting the >> bit there, as no-one should care about what it's set to? > > d->arch.pv.cpuidmasks is only allocated for PV domains (and starts by > dup()ing the default). Ah, that's the piece I was missing. My next question then is: Why do we do *_init_levelling() from early_init_*() in the first place? It looks conceptually wrong to me to set up leveling before having obtained CPUID data. Wouldn't there better be a separate hook in struct cpu_dev, to be invoked e.g. from identify_cpu() after generic_identify()? > When context switching levelling MSRs, any non-PV guest uses > cpumask_default. This captures idle and HVM vcpus. > > This is necessary because, at least at the time it was introduced, > {pv,hvm}_cpuid() issued native CPUID instructions to then feed data back > into guest context. Its probably less relevant now that guest_cpuid() > doesn't issue native instructions in the general case. > > Either way, HVM gained the default like idle, to cause the lazy > switching logic to switch less often. > > The problem we have after this patch is that default differs from PV in > the HYPERVISOR bit, which basically guarantees that we rewrite the leaf > 1 levelling on each context switch. > > However, having looked at the other features bits which differ for PV, > VME and PSE36 being hidden means we're always switching leaf 1 anyway, > so this change for HYPERVISOR doesn't make the situation any worse. > >>> The other approach is to order things more carefully so levelling is >>> configured after scanning for cpuid bits, but that has the downside that it >>> is >>> very easy to regress. >>> >>> Thoughts on which is the least-bad approach to take? Having written this >>> patch, I'm now erring on the side of doing it the other way. >> Besides the need for me to understand the aspect above, I'm afraid >> to judge I'd need to have at least a sketch of what the alternative >> would look like, in particular to figure how fragile it really is. > > It would be a small bit of careful reordering in cpu/amd.c > > The tipping factor is that, even if we arrange for idle context not to > have HYPERVISOR set (and therefore cpu_has_hypervisor ending up clear > when scanned), a regular CPUID instruction in PV context would see > HYPERVISOR as a property of virtualising things sensibly for guests. > > As we need to cope with HYPERVISOR being visible in some contexts, its > better to consider it uniformly visible and break any kind of notional > link between cpu_has_hypervisor matching what CPUID would see as the bit. After having set up leveling I think we shouldn't use CPUID anymore for leaves which may be leveled. As a result cpu_has_* / cpu_has() would then be the only information source in this regard. Such a general re-arrangement would then also appear to address your "easy to regress" concern. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.