[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest

On 07/20/2015 10:43 AM, Boris Ostrovsky wrote:
On 07/20/2015 10:09 AM, Dario Faggioli wrote:
On Fri, 2015-07-17 at 14:17 -0400, Boris Ostrovsky wrote:
On 07/17/2015 03:27 AM, Dario Faggioli wrote:
In the meanwhile, what should we do? Document this? How? "don't use
vNUMA with PV guest in SMT enabled systems" seems a bit harsh... Is
there a workaround we can put in place/suggest?
I haven't been able to reproduce this on my Intel box because I think I
have different core enumeration.

Yes, most likely, that's highly topology dependant. :-(

Can you try adding
to your config file?

Done (sorry for the delay, the testbox was busy doing other stuff).

Still no joy (.101 is the IP address of the guest, domain id 3):

root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# xl vcpu-list 3
Name ID VCPU CPU State Time(s) Affinity (Hard / Soft)
test                                 3     0    4   r-- 23.6  all / 0-7
test                                 3     1    9   r-- 19.8  all / 0-7
test                                 3     2    8   -b- 0.4  all / 8-15
test                                 3     3    4   -b- 0.2  all / 8-15

*HOWEVER* it seems to have an effect. In fact, now, topology as it is
shown in /sys/... is different:

root@test:~# cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
(it was 0-1)

This, OTOH, is still the same:
root@test:~# cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list

Also, I now see this:

[    0.150560] ------------[ cut here ]------------
[ 0.150560] WARNING: CPU: 2 PID: 0 at ../arch/x86/kernel/smpboot.c:317 topology_sane.isra.2+0x74/0x88() [ 0.150560] sched: CPU #2's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.150560] Modules linked in:
[    0.150560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1
[ 0.150560] 0000000000000009 ffff88001ee2fdd0 ffffffff81657c7b ffffffff810bbd2c [ 0.150560] ffff88001ee2fe20 ffff88001ee2fe10 ffffffff81081510 ffff88001ee2fea0 [ 0.150560] ffffffff8103aa02 ffff88003ea0a001 0000000000000000 ffff88001f20a040
[    0.150560] Call Trace:
[    0.150560]  [<ffffffff81657c7b>] dump_stack+0x4f/0x7b
[    0.150560]  [<ffffffff810bbd2c>] ? up+0x39/0x3e
[    0.150560]  [<ffffffff81081510>] warn_slowpath_common+0xa1/0xbb
[    0.150560]  [<ffffffff8103aa02>] ? topology_sane.isra.2+0x74/0x88
[    0.150560]  [<ffffffff81081570>] warn_slowpath_fmt+0x46/0x48
[    0.150560]  [<ffffffff8101eeb1>] ? __cpuid.constprop.0+0x15/0x19
[    0.150560]  [<ffffffff8103aa02>] topology_sane.isra.2+0x74/0x88
[    0.150560]  [<ffffffff8103acd0>] set_cpu_sibling_map+0x27a/0x444
[    0.150560]  [<ffffffff81056ac3>] ? numa_add_cpu+0x98/0x9f
[    0.150560]  [<ffffffff8100b8f2>] cpu_bringup+0x63/0xa8
[    0.150560]  [<ffffffff8100b945>] cpu_bringup_and_idle+0xe/0x1a
[    0.150560] ---[ end trace 63d204896cce9f68 ]---

Notice that it now says 'llc-sibling', while, before, it was saying

Exactly. You are now passing the first topology test which was to see that threads are on the same node. And since each processor has only one thread (as evidenced by thread_siblings_list) we are good.

The second test checks that cores (i.e. things that share last level cache) are on the same node. And they are not.

On AMD, BTW, we fail a different test so some other bits probably need
to be tweaked. You may fail it too (the LLC sanity check).

Yep, that's the one I guess. Should I try something more/else?

I'll need to see how LLC IDs are calculated, probably also from some CPUID bits.

No, can't do this: LLC is calculated from CPUID leaf 4 (on Intel) which use indexes in ECX register and xl syntax doesn't allow you to override CPUIDs for such leaves.


The question though will be --- what do we do with how cache sizes (and TLB sizes for that matter) are presented to the guests. Do we scale them down per thread?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.