[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest



On 27/07/2015 18:42, Dario Faggioli wrote:
> On Mon, 2015-07-27 at 17:33 +0100, Andrew Cooper wrote: >> On 27/07/15 17:31, David Vrabel wrote: >>> >>>> Yeah, indeed. That's the downside of Juergen's "Linux scheduler >>>> approach". But the issue is there, even without taking vNUMA into >>>> account, and I think something like that would really help (only for >>>> Dom0, and Linux guests, of course). >>> I disagree. Whether we're using vNUMA or not, Xen should still ensure >>> that the guest kernel and userspace see a consistent and correct >>> topology using the native mechanisms. >> >> +1 >> > +1 from me as well. In fact, a mechanism for making exactly such thing > happen, was what I was after when starting the thread. > > Then it came up that CPUID needs to be used for at least two different > and potentially conflicting purposes, that we want to support both and > that, whether and for whatever reason it's used, Linux configures its > scheduler after it, potentially resulting in rather pathological setups.

I don't see what the problem is here. Fundamentally, "NUMA optimise" vs "comply with licence" is a user/admin decision at boot time, and we need not cater to both halves at the same time.

Supporting either, as chosen by the admin, is worthwhile.

> > > It's at that point that some decoupling started to appear > interesting... :-P > > Also, are we really being consistent? If my methodology is correct > (which might not be, please, double check, and sorry for that), I'm > seeing quite some inconsistency around: > > HOST: >Â root@Zhaman:~# xl info -n >Â ... >Â cpu_topologyÂÂÂÂÂÂÂÂÂÂ : >Â cpu:ÂÂÂ coreÂÂÂ socketÂÂÂÂ node >ÂÂÂ 0:ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0 >ÂÂÂ 1:ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0 >ÂÂÂ 2:ÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0 >ÂÂÂ 3:ÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0 >ÂÂÂ 4:ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0 >ÂÂÂ 5:ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0 >ÂÂÂ 6:ÂÂÂÂÂ 10ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0 >ÂÂÂ 7:ÂÂÂÂÂ 10ÂÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0 >ÂÂÂ 8:ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1 >ÂÂÂ 9:ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1 >ÂÂ 10:ÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1 >ÂÂ 11:ÂÂÂÂÂÂ 1ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1 >ÂÂ 12:ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1 >ÂÂ 13:ÂÂÂÂÂÂ 9ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1 >ÂÂ 14:ÂÂÂÂÂ 10ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1 >ÂÂ 15:ÂÂÂÂÂ 10ÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 1

o_O

What kind of system results in this layout? Can you dump the ACPI tables and make them available?

> >Â ... >Â root@Zhaman:~# xl vcpu-list test >Â NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ IDÂ VCPUÂÂ CPU StateÂÂ Time(s) Affinity (Hard / Soft) >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 0ÂÂÂ 0ÂÂ r--ÂÂÂÂÂÂ 1.5Â 0 / all >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 1ÂÂÂ 1ÂÂ r--ÂÂÂÂÂÂ 0.2Â 1 / all >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 2ÂÂÂ 8ÂÂ -b-ÂÂÂÂÂÂ 2.2Â 8 / all >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 3ÂÂÂ 9ÂÂ -b-ÂÂÂÂÂÂ 2.0Â 9 / all > > GUEST (HVM, 4 vcpus): >Â root@test:~# cpuid|grep CORE_ID >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0 >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=1 >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=0 SMT_ID=0 >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=0 SMT_ID=1 > > HOST: >Â root@Zhaman:~# xl vcpu-pin 2 all 0 >Â root@Zhaman:~# xl vcpu-list 2 >Â NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ IDÂ VCPUÂÂ CPU StateÂÂ Time(s) Affinity (Hard / Soft) >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 0ÂÂÂ 0ÂÂ -b-ÂÂÂÂÂ 43.7Â 0 / all >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 1ÂÂÂ 0ÂÂ -b-ÂÂÂÂÂ 38.4Â 0 / all >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 2ÂÂÂ 0ÂÂ -b-ÂÂÂÂÂ 36.9Â 0 / all >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 3ÂÂÂ 0ÂÂ -b-ÂÂÂÂÂ 38.8Â 0 / all > > GUEST: >Â root@test:~# cpuid|grep CORE_ID >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0 >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0 >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0 >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=16 SMT_ID=0 > > HOST: >Â root@Zhaman:~# xl vcpu-pin 2 0 7 >Â root@Zhaman:~# xl vcpu-pin 2 1 7 >Â root@Zhaman:~# xl vcpu-pin 2 2 15 >Â root@Zhaman:~# xl vcpu-pin 2 3 15 >Â root@Zhaman:~# xl vcpu-list 2 >Â NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ IDÂ VCPUÂÂ CPU StateÂÂ Time(s) Affinity (Hard / Soft) >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 0ÂÂÂ 7ÂÂ -b-ÂÂÂÂÂ 44.3Â 7 / all >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 1ÂÂÂ 7ÂÂ -b-ÂÂÂÂÂ 38.9Â 7 / all >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 2ÂÂ 15ÂÂ -b-ÂÂÂÂÂ 37.3Â 15 / all >Â testÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 2ÂÂÂÂ 3ÂÂ 15ÂÂ -b-ÂÂÂÂÂ 39.2Â 15 / all > > GUEST: >Â root@test:~# cpuid|grep CORE_ID >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=26 SMT_ID=1 >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=26 SMT_ID=1 >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=10 SMT_ID=1 >ÂÂÂ (APIC synth): PKG_ID=0 CORE_ID=10 SMT_ID=1 > > So, it looks to me that: >Â 1) any application using CPUID for either licensing or >ÂÂÂÂ placement/performance optimization will get (potentially) random >ÂÂÂÂ results; >Â 2) whatever set of values the kernel used, during guest boot, to build >ÂÂÂÂ up its internal scheduling data structures, has no guarantee of >ÂÂÂÂ being related to any value returned by CPUID, at a later point. > > Hence, I think I'm seeing inconsistency between kernel and userspace > (and between userspace and itself, over time) already... Am I > overlooking something?

All current CPUID values presented to guests are about as reliable as being picked from /dev/urandom. (This isn't strictly true - the feature flags will be in the right ballpark if the VM has not migrated yet).

Fixing this (as described in my feature levelling design document) is sufficiently non-trivial that it has been deferred to post feature-levelling work.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.