[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
On 02/17/11 08:05, Juergen Gross wrote: On 02/16/11 14:54, George Dunlap wrote:Andre (and Juergen), can you try again with the attached patch? What the patch basically does is try to make "cpu_disable_scheduler()" do what it seems to say it does. :-) Namely, the various scheduler-related interrutps (both per-cpu ticks and the master tick) is a part of the scheduler, so disable them before doing anything, and don't enable them until the cpu is really ready to go again. To be precise: * cpu_disable_scheduler() disables ticks * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, and does it after inserting the idle vcpu * Modify semantics, s.t., {alloc,free}_pdata() don't actually start or stop tickers + Call tick_{resume,suspend} in cpu_{up,down}, respectively * Modify credit1's tick_{suspend,resume} to handle the master ticker as well. With this patch (if dom0 doesn't get wedged due to all 8 vcpus being on one pcpu), I can perform thousands of operations successfully. (NB this is not ready for application yet, I just wanted to check to see if it fixes Andre's problem) Tried again, this time with the following patch: diff -r 72470de157ce xen/common/sched_credit.c --- a/xen/common/sched_credit.c Wed Feb 16 09:49:33 2011 +0000 +++ b/xen/common/sched_credit.c Wed Feb 16 15:09:54 2011 +0100 @@ -1268,7 +1268,8 @@ csched_load_balance(struct csched_privat /* * Any work over there to steal? */ - speer = csched_runq_steal(peer_cpu, cpu, snext->pri); + speer = cpu_isset(peer_cpu, *online) ? + csched_runq_steal(peer_cpu, cpu, snext->pri) : NULL; pcpu_schedule_unlock(peer_cpu); if ( speer != NULL ) { Worked without any flaw for 30000 iterations. Juergen After some thousand iterations the machine hang and after dumping Dom0 registers to console it continued running and crashed about a second later: (XEN) cpupool_unassign_cpu(pool=0,cpu=9) (XEN) cpupool_unassign_cpu(pool=0,cpu=9) ffff83083fff74c0 (XEN) cpupool_unassign_cpu ret=0 (XEN) cpupool_unassign_cpu(pool=0,cpu=4) (XEN) cpupool_unassign_cpu(pool=0,cpu=4) ffff83083fff74c0 (XEN) cpupool_unassign_cpu ret=0 (XEN) cpupool_assign_cpu(pool=1,cpu=9) (XEN) cpupool_assign_cpu(pool=1,cpu=9) ffff83083002de40 (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279 (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- (XEN) CPU: 9 (XEN) RIP: e008:[<ffff82c480126100>] active_timer+0xc/0x37 (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: 0000000000000000 rcx: 0000000000000000 (XEN) rdx: ffff830839d8ff18 rsi: 0000010dbb628a80 rdi: ffff83083ffbcf98 (XEN) rbp: ffff830839d8fd50 rsp: ffff830839d8fd50 r8: ffff83083ffbcf90 (XEN) r9: ffff82c480213680 r10: 00000000ffffffff r11: 0000000000000010 (XEN) r12: ffff82c4802d3f80 r13: ffff82c4802d3f80 r14: ffff83083ffbcf98 (XEN) r15: ffff83083ffbcfc0 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 000000007809c000 cr2: 0000000000620048 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff830839d8fd50: (XEN) ffff830839d8fda0 ffff82c480126ef9 0000000000000000 0000010dbb628a80 (XEN) 0000000000000086 0000000000000009 ffff83083002de40 ffff83083002dd50 (XEN) 0000000000000009 0000000000000009 ffff830839d8fdc0 ffff82c480117906 (XEN) ffff83083ffa3b40 ffff83083ffa5d70 ffff830839d8fe30 ffff82c4801214fa (XEN) ffff83083002dd00 0000000900000100 0000000000000286 ffff8300780da000 (XEN) ffff83083ffbcf80 ffff83083ffbcf90 ffff82c480247e00 0000000000000009 (XEN) 00000000fffffff0 ffff83083002dd00 0000000000000000 ffff8300781cc198 (XEN) ffff830839d8fe60 ffff82c4801019ff 0000000000000009 0000000000000009 (XEN) ffff8300781cc198 ffff830839d990d0 ffff830839d8fe80 ffff82c480101bd9 (XEN) ffff83107e80c5b0 ffff8300781cc000 ffff830839d8fea0 ffff82c480104f21 (XEN) 0000000000000009 ffff830839d990e0 ffff830839d8fee0 ffff82c480125b6c (XEN) ffff82c48024a020 ffff830839d8ff18 ffff82c48024a020 ffff830839d8ff18 (XEN) ffff830839d99060 ffff830839d99040 ffff830839d8ff10 ffff82c48015645a (XEN) 0000000000000000 ffff8300780da000 ffff8300780da000 ffffffffffffffff (XEN) ffff830839d8fe00 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 ffffffff8062bda0 ffff880fbb1e5fd8 0000000000000246 (XEN) 0000000000000000 000000010003347d 0000000000000000 0000000000000000 (XEN) ffffffff800033aa 00000000deadbeef 00000000deadbeef 00000000deadbeef (XEN) 0000010000000000 ffffffff800033aa 000000000000e033 0000000000000246 (XEN) ffff880fbb1e5f08 000000000000e02b 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) [<ffff82c480126100>] active_timer+0xc/0x37 (XEN) [<ffff82c480126ef9>] set_timer+0x102/0x218 (XEN) [<ffff82c480117906>] csched_tick_resume+0x53/0x75 (XEN) [<ffff82c4801214fa>] schedule_cpu_switch+0x1f1/0x25c (XEN) [<ffff82c4801019ff>] cpupool_assign_cpu_locked+0x61/0xd6 (XEN) [<ffff82c480101bd9>] cpupool_assign_cpu_helper+0x9f/0xcd (XEN) [<ffff82c480104f21>] continue_hypercall_tasklet_handler+0x51/0xc3 (XEN) [<ffff82c480125b6c>] do_tasklet+0xe1/0x155 (XEN) [<ffff82c48015645a>] idle_loop+0x5f/0x67 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 9: (XEN) Assertion 'timer->status >= TIMER_STATUS_inactive' failed at timer.c:279 (XEN) **************************************** Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |