[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split

Juergen Gross wrote:
Andre, Stephan,

could you give the attached patch a try?
It moves the cpu assigning/unassigning into a tasklet always executed on the
cpu to be moved. This should avoid critical races.

Done. I checked it twice, but sadly it does not fix the issue. It still BUGs:
(XEN) Xen BUG at sched_credit.c:990
(XEN) ----[ Xen-4.1.0-rc3-pre  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c480118208>] csched_acct+0x11f/0x419
(XEN) RFLAGS: 0000000000010006   CONTEXT: hypervisor
(XEN) rax: 0000000000000010   rbx: 0000000000000f00   rcx: 0000000000000100
(XEN) rdx: 0000000000001000   rsi: ffff830437ffa600   rdi: 0000000000000010
(XEN) rbp: ffff82c480297e10   rsp: ffff82c480297d80   r8:  0000000000000100
(XEN) r9:  0000000000000006   r10: ffff82c4802d4100   r11: 0000017322fea49a
(XEN) r12: ffff830437ffa5e0   r13: ffff82c4801180e9   r14: ffff83043399f018
(XEN) r15: ffff830434321ec0   cr0: 000000008005003b   cr4: 00000000000006f0
(XEN) cr3: 00000000c7c9c000   cr2: 0000000001ec8048
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c480297d80:
(XEN)    ffff82c480297f18 fffffed4c7cd6000 ffff830000000eff ffff830437ffa5e0
(XEN)    ffff830437ffa5e8 ffff82c480297df8 ffff830437ffa5e0 0000000000000282
(XEN)    ffff830437ffa5e8 00001c200000000f 00000f0000000f00 0000000000000000
(XEN)    ffff82c400000000 ffff82c4802d3f80 ffff830437ffa5e0 ffff82c4801180e9
(XEN)    ffff83043399f018 ffff83043399f010 ffff82c480297e40 ffff82c480126044
(XEN)    0000000000000002 ffff830437ffa600 ffff82c4802d3f80 00000173010849b7
(XEN)    ffff82c480297e90 ffff82c480126369 ffff82c48024aea0 ffff82c4802d3f80
(XEN)    ffff83043399f010 0000000000000000 0000000000000000 ffff82c4802b0880
(XEN)    ffff82c480297f18 ffffffffffffffff ffff82c480297ed0 ffff82c480123437
(XEN)    ffff8300c7e1e0f8 ffff82c480297f18 ffff82c48024aea0 ffff82c480297f18
(XEN)    0000017301008665 ffff82c4802d3ec0 ffff82c480297ee0 ffff82c4801234b2
(XEN)    ffff82c480297f10 ffff82c4801564f5 0000000000000000 ffff8300c7cd6000
(XEN)    0000000000000000 ffff8300c7e1e000 ffff82c480297d48 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffffffff81a69060 ffff8817a8553f10
(XEN)    ffff8817a8553fd8 0000000000000246 ffff8817a8553e80 ffff880000000001
(XEN)    0000000000000000 0000000000000000 ffffffff810093aa 000000000000e030
(XEN)    00000000deadbeef 00000000deadbeef 0000010000000000 ffffffff810093aa
(XEN)    000000000000e033 0000000000000246 ffff8817a8553ef8 000000000000e02b
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffff8300c7cd6000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c480118208>] csched_acct+0x11f/0x419
(XEN)    [<ffff82c480126044>] execute_timer+0x4e/0x6c
(XEN)    [<ffff82c480126369>] timer_softirq_action+0xf2/0x245
(XEN)    [<ffff82c480123437>] __do_softirq+0x88/0x99
(XEN)    [<ffff82c4801234b2>] do_softirq+0x6a/0x7a
(XEN)    [<ffff82c4801564f5>] idle_loop+0x6a/0x6f
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Xen BUG at sched_credit.c:990
(XEN) ****************************************
(XEN) Reboot in five seconds...

Stephan had created more printk debug patches, we will summarize the results soon.


Regarding Stephans rant:
You should be aware that the main critical sections are only in the tasklets.
The locking in the main routines is needed only to avoid the cpupool to be
destroyed in between.

I'm not sure whether the master_ticker patch is still needed. It seems to
break something, as my machine hung up after several 100 cpu moves (without
the new patch). I'm still investigating this problem.


Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany

Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.