[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest

To: Dario Faggioli <dario.faggioli@xxxxxxxxxx>
From: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
Date: Tue, 21 Jul 2015 16:00:29 -0400
Cc: Elena Ufimtseva <elena.ufimtseva@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, David Vrabel <david.vrabel@xxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
Delivery-date: Tue, 21 Jul 2015 20:01:32 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 07/20/2015 10:43 AM, Boris Ostrovsky wrote:

On 07/20/2015 10:09 AM, Dario Faggioli wrote:

On Fri, 2015-07-17 at 14:17 -0400, Boris Ostrovsky wrote:

On 07/17/2015 03:27 AM, Dario Faggioli wrote:

In the meanwhile, what should we do? Document this? How? "don't use
vNUMA with PV guest in SMT enabled systems" seems a bit harsh... Is
there a workaround we can put in place/suggest?

I haven't been able to reproduce this on my Intel box because I think I
have different core enumeration.

Yes, most likely, that's highly topology dependant. :-(

Can you try adding
    cpuid=['0x1:ebx=xxxxxxxx00000001xxxxxxxxxxxxxxxx']
to your config file?

Done (sorry for the delay, the testbox was busy doing other stuff).

Still no joy (.101 is the IP address of the guest, domain id 3):

root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# ssh root@xxxxxxxxxxxxx "yes > /dev/null 2>&1 &"
root@Zhaman:~# xl vcpu-list 3

Name ID VCPU CPU State Time(s)Affinity (Hard / Soft)

test                                 3     0    4   r-- 23.6  all / 0-7
test                                 3     1    9   r-- 19.8  all / 0-7
test                                 3     2    8   -b- 0.4  all / 8-15
test                                 3     3    4   -b- 0.2  all / 8-15

*HOWEVER* it seems to have an effect. In fact, now, topology as it is
shown in /sys/... is different:

root@test:~# cat/sys/devices/system/cpu/cpu0/topology/thread_siblings_list

0
(it was 0-1)

This, OTOH, is still the same:

root@test:~# cat/sys/devices/system/cpu/cpu0/topology/core_siblings_list

0-3

Also, I now see this:

[    0.150560] ------------[ cut here ]------------

[ 0.150560] WARNING: CPU: 2 PID: 0 at../arch/x86/kernel/smpboot.c:317 topology_sane.isra.2+0x74/0x88()[ 0.150560] sched: CPU #2's llc-sibling CPU #0 is not on the samenode! [node: 1 != 0]. Ignoring dependency.

[    0.150560] Modules linked in:
[    0.150560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.19.0+ #1

[ 0.150560] 0000000000000009 ffff88001ee2fdd0 ffffffff81657c7bffffffff810bbd2c[ 0.150560] ffff88001ee2fe20 ffff88001ee2fe10 ffffffff81081510ffff88001ee2fea0[ 0.150560] ffffffff8103aa02 ffff88003ea0a001 0000000000000000ffff88001f20a040

[    0.150560] Call Trace:
[    0.150560]  [<ffffffff81657c7b>] dump_stack+0x4f/0x7b
[    0.150560]  [<ffffffff810bbd2c>] ? up+0x39/0x3e
[    0.150560]  [<ffffffff81081510>] warn_slowpath_common+0xa1/0xbb
[    0.150560]  [<ffffffff8103aa02>] ? topology_sane.isra.2+0x74/0x88
[    0.150560]  [<ffffffff81081570>] warn_slowpath_fmt+0x46/0x48
[    0.150560]  [<ffffffff8101eeb1>] ? __cpuid.constprop.0+0x15/0x19
[    0.150560]  [<ffffffff8103aa02>] topology_sane.isra.2+0x74/0x88
[    0.150560]  [<ffffffff8103acd0>] set_cpu_sibling_map+0x27a/0x444
[    0.150560]  [<ffffffff81056ac3>] ? numa_add_cpu+0x98/0x9f
[    0.150560]  [<ffffffff8100b8f2>] cpu_bringup+0x63/0xa8
[    0.150560]  [<ffffffff8100b945>] cpu_bringup_and_idle+0xe/0x1a
[    0.150560] ---[ end trace 63d204896cce9f68 ]---

Notice that it now says 'llc-sibling', while, before, it was saying
'smt-sibling'.

Exactly. You are now passing the first topology test which was to seethat threads are on the same node. And since each processor has onlyone thread (as evidenced by thread_siblings_list) we are good.

The second test checks that cores (i.e. things that share last levelcache) are on the same node. And they are not.

On AMD, BTW, we fail a different test so some other bits probably need
to be tweaked. You may fail it too (the LLC sanity check).

Yep, that's the one I guess. Should I try something more/else?

I'll need to see how LLC IDs are calculated, probably also from someCPUID bits.

No, can't do this: LLC is calculated from CPUID leaf 4 (on Intel) whichuse indexes in ECX register and xl syntax doesn't allow you to overrideCPUIDs for such leaves.


-boris

The question though will be --- what do we do with how cache sizes(and TLB sizes for that matter) are presented to the guests. Do wescale them down per thread?
-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Dario Faggioli

References:
- [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Dario Faggioli
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Jan Beulich
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Andrew Cooper
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Wei Liu
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Andrew Cooper
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Boris Ostrovsky
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Jan Beulich
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Andrew Cooper
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Boris Ostrovsky
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Jan Beulich
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Dario Faggioli
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Boris Ostrovsky
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Dario Faggioli
- Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
  - From: Boris Ostrovsky

Prev by Date: [Xen-devel] altp2m: patch 07/15 and 11/15
Next by Date: [Xen-devel] [linux-3.18 test] 59785: regressions - FAIL
Previous by thread: Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
Next by thread: Re: [Xen-devel] PV-vNUMA issue: topology is misinterpreted by the guest
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.