[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Hypervisor crash(!) on xl cpupool-numa-split
Am 16.02.2011 15:11, schrieb Juergen Gross: On 02/16/11 14:54, George Dunlap wrote:Andre (and Juergen), can you try again with the attached patch? George, Juergen, thanks for all your work on this! I will try the patch as soon as I am back in the office today afternoon. Regards, Andre. What the patch basically does is try to make "cpu_disable_scheduler()" do what it seems to say it does. :-) Namely, the various scheduler-related interrutps (both per-cpu ticks and the master tick) is a part of the scheduler, so disable them before doing anything, and don't enable them until the cpu is really ready to go again. To be precise: * cpu_disable_scheduler() disables ticks * scheduler_cpu_switch() only enables ticks if adding a cpu to a pool, and does it after inserting the idle vcpu * Modify semantics, s.t., {alloc,free}_pdata() don't actually start or stop tickers + Call tick_{resume,suspend} in cpu_{up,down}, respectivelyI tried this before :-) It didn't work for Andre, but may be there were some bits missing.* Modify credit1's tick_{suspend,resume} to handle the master ticker as well. With this patch (if dom0 doesn't get wedged due to all 8 vcpus being on one pcpu), I can perform thousands of operations successfully.Nice. I'll try later. In the moment I'm testing another patch (attached for review, if you like). I think I've identified two possible races. Juergen(NB this is not ready for application yet, I just wanted to check to see if it fixes Andre's problem) -George On Wed, Feb 16, 2011 at 9:47 AM, Juergen Gross <juergen.gross@xxxxxxxxxxxxxx> wrote:Okay, I have some more data. I activated cpupool_dprintk() and included checks in sched_credit.c to test for weight inconsistencies. To reduce race possibilities I've added my patch to execute cpu assigning/unassigning always in a tasklet on the cpu to be moved. Here is the result: (XEN) cpupool_unassign_cpu(pool=0,cpu=6) (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 (XEN) cpupool_unassign_cpu(pool=0,cpu=6) (XEN) cpupool_unassign_cpu(pool=0,cpu=6) ret -16 (XEN) cpupool_assign_cpu(pool=0,cpu=1) (XEN) cpupool_assign_cpu(pool=0,cpu=1) ffff83083fff74c0 (XEN) cpupool_assign_cpu(cpu=1) ret 0 (XEN) cpupool_assign_cpu(pool=1,cpu=4) (XEN) cpupool_assign_cpu(pool=1,cpu=4) ffff831002ad5e40 (XEN) cpupool_assign_cpu(cpu=4) ret 0 (XEN) cpu 4, weight 0,prv ffff831002ad5e40, dom 0: (XEN) sdom->weight: 256, sdom->active_vcpu_count: 1 (XEN) Xen BUG at sched_credit.c:570 (XEN) ----[ Xen-4.1.0-rc5-pre x86_64 debug=y Tainted: C ]---- (XEN) CPU: 4 (XEN) RIP: e008:[<ffff82c4801197d7>] csched_tick+0x186/0x37f (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor (XEN) rax: 0000000000000000 rbx: ffff830839d3ec30 rcx: 0000000000000000 (XEN) rdx: ffff830839dcff18 rsi: 000000000000000a rdi: ffff82c4802542e8 (XEN) rbp: ffff830839dcfe38 rsp: ffff830839dcfde8 r8: 0000000000000004 (XEN) r9: ffff82c480213520 r10: 00000000fffffffc r11: 0000000000000001 (XEN) r12: 0000000000000004 r13: ffff830839d3ec40 r14: ffff831002ad5e40 (XEN) r15: ffff830839d66f90 cr0: 000000008005003b cr4: 00000000000026f0 (XEN) cr3: 0000001020a98000 cr2: 00007fc5e9b79d98 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff830839dcfde8: (XEN) ffff83083ffa3ba0 ffff831002ad5e40 0000000000000246 ffff830839d6c000 (XEN) 0000000000000000 ffff830839dd1100 0000000000000004 ffff82c480119651 (XEN) ffff831002b28018 ffff831002b28010 ffff830839dcfe68 ffff82c480126204 (XEN) 0000000000000002 ffff83083ffa3bb8 ffff830839dd1100 000000cae439ea7e (XEN) ffff830839dcfeb8 ffff82c480126539 00007fc5e9fa5b20 ffff830839dd1100 (XEN) ffff831002b28010 0000000000000004 0000000000000004 ffff82c4802b0880 (XEN) ffff830839dcff18 ffffffffffffffff ffff830839dcfef8 ffff82c480123647 (XEN) ffff830839dcfed8 ffff830077eee000 00007fc5e9b79d98 00007fc5e9fa5b20 (XEN) 0000000000000002 00007fff46826f20 ffff830839dcff08 ffff82c4801236c2 (XEN) 00007cf7c62300c7 ffff82c480206ad6 00007fff46826f20 0000000000000002 (XEN) 00007fc5e9fa5b20 00007fc5e9b79d98 00007fff46827260 00007fff46826f50 (XEN) 0000000000000246 0000000000000032 0000000000000000 00000000ffffffff (XEN) 0000000000000009 00007fc5e9d9de1a 0000000000000003 0000000000004848 (XEN) 00007fc5e9b7a000 0000010000000000 ffffffff800073f0 000000000000e033 (XEN) 0000000000000246 ffff880f97b51fc8 000000000000e02b 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000004 (XEN) ffff830077eee000 00000043b9afd180 0000000000000000 (XEN) Xen call trace: (XEN) [<ffff82c4801197d7>] csched_tick+0x186/0x37f (XEN) [<ffff82c480126204>] execute_timer+0x4e/0x6c (XEN) [<ffff82c480126539>] timer_softirq_action+0xf6/0x239 (XEN) [<ffff82c480123647>] __do_softirq+0x88/0x99 (XEN) [<ffff82c4801236c2>] do_softirq+0x6a/0x7a (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 4: (XEN) Xen BUG at sched_credit.c:570 (XEN) **************************************** As you can see, a Dom0 vcpus is becoming active on a pool 1 cpu. The BUG_ON triggered in csched_acct() is a logical result of this. How this can happen I don't know yet. Anyone any idea? I'll keep searching... Juergen On 02/15/11 08:22, Juergen Gross wrote:On 02/14/11 18:57, George Dunlap wrote:The good news is, I've managed to reproduce this on my local test hardware with 1x4x2 (1 socket, 4 cores, 2 threads per core) using the attached script. It's time to go home now, but I should be able to dig something up tomorrow. To use the script: * Rename cpupool0 to "p0", and create an empty second pool, "p1" * You can modify elements by adding "arg=val" as arguments. * Arguments are: + dryrun={true,false} Do the work, but don't actually execute any xl arguments. Default false. + left: Number commands to execute. Default 10. + maxcpus: highest numerical value for a cpu. Default 7 (i.e., 0-7 is 8 cpus). + verbose={true,false} Print what you're doing. Default is true. The script sometimes attempts to remove the last cpu from cpupool0; in this case, libxl will print an error. If the script gets an error under that condition, it will ignore it; under any other condition, it will print diagnostic information. What finally crashed it for me was this command: # ./cpupool-test.sh verbose=false left=1000Nice! With your script I finally managed to get the error, too. On my box (2 sockets a 6 cores) I had to use ./cpupool-test.sh verbose=false left=10000 maxcpus=11 to trigger it. Looking for more data now... Juergen-George On Fri, Feb 11, 2011 at 7:39 AM, Andre Przywara<andre.przywara@xxxxxxx> wrote:Juergen Gross wrote:On 02/10/11 15:18, Andre Przywara wrote:Andre Przywara wrote:On 02/10/2011 07:42 AM, Juergen Gross wrote:On 02/09/11 15:21, Juergen Gross wrote:Andre, George, What seems to be interesting: I think the problem did always occur when a new cpupool was created and the first cpu was moved to it. I think my previous assumption regarding the master_ticker was not too bad. I think somehow the master_ticker of the new cpupool is becoming active before the scheduler is really initialized properly. This could happen, if enough time is spent between alloc_pdata for the cpu to be moved and the critical section in schedule_cpu_switch(). The solution should be to activate the timers only if the scheduler is ready for them. George, do you think the master_ticker should be stopped in suspend_ticker as well? I still see potential problems for entering deep C-States. I think I'll prepare a patch which will keep the master_ticker active for the C-State case and migrate it for the schedule_cpu_switch() case.Okay, here is a patch for this. It ran on my 4-core machine without any problems. Andre, could you give it a try?Did, but unfortunately it crashed as always. Tried twice and made sure I booted the right kernel. Sorry. The idea with the race between the timer and the state changing sounded very appealing, actually that was suspicious to me from the beginning. I will add some code to dump the state of all cpupools to the BUG_ON to see in which situation we are when the bug triggers.OK, here is a first try of this, the patch iterates over all CPU pools and outputs some data if the BUG_ON ((sdom->weight * sdom->active_vcpu_count)> weight_left) condition triggers: (XEN) CPU pool #0: 1 domains (SMP Credit Scheduler), mask: fffffffc003f (XEN) CPU pool #1: 0 domains (SMP Credit Scheduler), mask: fc0 (XEN) CPU pool #2: 0 domains (SMP Credit Scheduler), mask: 1000 (XEN) Xen BUG at sched_credit.c:1010 .... The masks look proper (6 cores per node), the bug triggers when the first CPU is about to be(?) inserted.Sure? I'm missing the cpu with mask 2000. I'll try to reproduce the problem on a larger machine here (24 cores, 4 numa nodes). Andre, can you give me your xen boot parameters? Which xen changeset are you running, and do you have any additional patches in use?The grub lines: kernel (hd1,0)/boot/xen-22858_debug_04.gz console=com1,vga com1=115200 module (hd1,0)/boot/vmlinuz-2.6.32.27_pvops console=tty0 console=ttyS0,115200 ro root=/dev/sdb1 xencons=hvc0 All of my experiments are use c/s 22858 as a base. If you use a AMD Magny-Cours box for your experiments (socket C32 or G34), you should add the following patch (removing the line) --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -803,7 +803,6 @@ static void pv_cpuid(struct cpu_user_regs *regs) __clear_bit(X86_FEATURE_SKINIT % 32,&c); __clear_bit(X86_FEATURE_WDT % 32,&c); __clear_bit(X86_FEATURE_LWP % 32,&c); - __clear_bit(X86_FEATURE_NODEID_MSR % 32,&c); __clear_bit(X86_FEATURE_TOPOEXT % 32,&c); break; case 5: /* MONITOR/MWAIT */ This is not necessary (in fact that reverts my patch c/s 22815), but raises the probability to trigger the bug, probably because it increases the pressure of the Dom0 scheduler. If you cannot trigger it with Dom0, try to create a guest with many VCPUs and squeeze it into a small CPU-pool. Good luck ;-) Andre. -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel-- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@xxxxxxxxxxxxxx Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html -- Andre Przywara AMD-Operating System Research Center (OSRC), Dresden, Germany _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |