[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: soft lockup was (Re: [Xen-users] Kernel error)
This soft-lockup problem seems to occur when I perform a large MySQL query that takes several seconds to complete on a DomU. At this point, the soft lockup message appears and the Xen box seems to stall for about 5-10 seconds. After that, everything continues normally again. The box is an Abit-LG81 motherboard (Skt775, ICH7) with an Intel Celeron 2.7GHz processor and 2 GB of RAM. I am running software RAID-5 across the 4 SATA drives in Dom0 and providing the disks to the DomUs using LVM. The basic installation was Kubuntu Dapper Drake 6.06 and I installed the Xen kernel from the 3.0.2-2 binaries on the Xen site. A capture of the relevant information from syslog is below. This is what I get for most of the errors: Sep 3 15:48:20 hydra kernel: Pid: 0, comm: swapper Sep 3 15:48:20 hydra kernel: EIP: 0061:[hypercall_page+935/4096] CPU: 0 Sep 3 15:48:20 hydra kernel: EIP is at 0xc01013a7 Sep 3 15:48:20 hydra kernel: EFLAGS: 00000296 Tainted: GF (2.6.16-xen #1) Sep 3 15:48:20 hydra kernel: EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00001f8e Sep 3 15:48:20 hydra kernel: ESI: 00000000 EDI: 00000001 EBP: c03da000 DS: 007b ES: 007b Sep 3 15:48:20 hydra kernel: CR0: 8005003b CR2: b7e34000 CR3: 00df6000 CR4: 00000640 Sep 3 15:48:20 hydra kernel: [xen_idle+83/176] xen_idle+0x53/0xb0 Sep 3 15:48:20 hydra kernel: [cpu_idle+122/224] cpu_idle+0x7a/0xe0 Sep 3 15:48:20 hydra kernel: [start_kernel+439/512] start_kernel+0x1b7/0x200 Sep 3 15:48:20 hydra kernel: [unknown_bootoption+0/464] unknown_bootoption+0x0/0x1d0 Sometimes I get a longer trace like the one below. The exact trace varies a bit but the starting function is always "notify_remote_via_irq": Sep 3 16:09:30 hydra kernel: BUG: soft lockup detected on CPU#0! Sep 3 16:09:30 hydra kernel: Sep 3 16:09:30 hydra kernel: Pid: 0, comm: swapper Sep 3 16:09:30 hydra kernel: EIP: 0061:[hypercall_page+519/4096] CPU: 0 Sep 3 16:09:30 hydra kernel: EIP is at 0xc0101207 Sep 3 16:09:30 hydra kernel: EFLAGS: 00000202 Tainted: GF (2.6.16-xen #1) Sep 3 16:09:30 hydra kernel: EAX: 00000000 EBX: c03dbc98 ECX: c114ed40 EDX: c03dbef4 Sep 3 16:09:30 hydra kernel: ESI: 00000000 EDI: 00000112 EBP: c0432fc0 DS: 007b ES: 007b Sep 3 16:09:30 hydra kernel: CR0: 8005003b CR2: b7e2c4b0 CR3: 00df6000 CR4: 00000640 Sep 3 16:09:30 hydra kernel: [notify_remote_via_irq+41/64] notify_remote_via_irq+0x29/0x40 Sep 3 16:09:30 hydra kernel: [kfree_skbmem+94/144] kfree_skbmem+0x5e/0x90 Sep 3 16:09:30 hydra kernel: [net_rx_action+1123/1280] net_rx_action+0x463/0x500 Sep 3 16:09:30 hydra kernel: [fib_lookup+209/320] fib_lookup+0xd1/0x140 Sep 3 16:09:30 hydra kernel: [ip_route_input_slow+440/2528] ip_route_input_slow+0x1b8/0x9e0 Sep 3 16:09:30 hydra kernel: [try_to_wake_up+768/880] try_to_wake_up+0x300/0x370 Sep 3 16:09:30 hydra kernel: [<e13c5000>] br_forward_finish+0x0/0x70 [bridge] Sep 3 16:09:30 hydra kernel: [neigh_lookup+136/208] neigh_lookup+0x88/0xd0 Sep 3 16:09:30 hydra kernel: [kfree_skbmem+94/144] kfree_skbmem+0x5e/0x90 Sep 3 16:09:30 hydra kernel: [arp_process+142/1456] arp_process+0x8e/0x5b0 Sep 3 16:09:30 hydra kernel: [ip_local_deliver+280/688] ip_local_deliver+0x118/0x2b0 Sep 3 16:09:30 hydra kernel: [arp_rcv+221/400] arp_rcv+0xdd/0x190 Sep 3 16:09:30 hydra kernel: [packet_rcv_spkt+359/672] packet_rcv_spkt+0x167/0x2a0 Sep 3 16:09:30 hydra kernel: [netif_receive_skb+650/816] netif_receive_skb+0x28a/0x330 Sep 3 16:09:30 hydra kernel: [process_backlog+215/400] process_backlog+0xd7/0x190 Sep 3 16:09:30 hydra kernel: [tasklet_action+157/320] tasklet_action+0x9d/0x140 Sep 3 16:09:30 hydra kernel: [__do_softirq+245/288] __do_softirq+0xf5/0x120 Sep 3 16:09:30 hydra kernel: [do_softirq+149/160] do_softirq+0x95/0xa0 Sep 3 16:09:30 hydra kernel: [do_IRQ+31/48] do_IRQ+0x1f/0x30 Sep 3 16:09:30 hydra kernel: [evtchn_do_upcall+168/240] evtchn_do_upcall+0xa8/0xf0 Sep 3 16:09:30 hydra kernel: [hypervisor_callback+44/52] hypervisor_callback+0x2c/0x34 Sep 3 16:09:30 hydra kernel: [xen_idle+83/176] xen_idle+0x53/0xb0 Sep 3 16:09:30 hydra kernel: [cpu_idle+122/224] cpu_idle+0x7a/0xe0 Sep 3 16:09:30 hydra kernel: [start_kernel+439/512] start_kernel+0x1b7/0x200 Just once, I got the following error: Sep 3 18:40:35 hydra kernel: Pid: 2268, comm: md0_raid5 Sep 3 18:40:35 hydra kernel: EIP: 0061:[hypercall_page+551/4096] CPU: 0 Sep 3 18:40:35 hydra kernel: EIP is at 0xc0101227 Sep 3 18:40:35 hydra kernel: EFLAGS: 00200246 Tainted: GF (2.6.16-xen #1) Sep 3 18:40:35 hydra kernel: EAX: 00030000 EBX: 00000000 ECX: 00000000 EDX: c0619c2c Sep 3 18:40:35 hydra kernel: ESI: c0619b30 EDI: c0619b40 EBP: 00000001 DS: 007b ES: 007b Sep 3 18:40:35 hydra kernel: CR0: 8005003b CR2: b7e34000 CR3: 003f2000 CR4: 00000640 Sep 3 18:40:35 hydra kernel: [force_evtchn_callback+10/16] force_evtchn_callback+0xa/0x10 Sep 3 18:40:35 hydra kernel: [get_request+727/800] get_request+0x2d7/0x320 Sep 3 18:40:35 hydra kernel: [lock_timer_base+36/80] lock_timer_base+0x24/0x50 Sep 3 18:40:35 hydra kernel: [get_request_wait+44/368] get_request_wait+0x2c/0x170 Sep 3 18:40:35 hydra kernel: [blk_plug_device+99/160] blk_plug_device+0x63/0xa0 Sep 3 18:40:35 hydra kernel: [kobject_put+31/48] kobject_put+0x1f/0x30 Sep 3 18:40:35 hydra kernel: [kobject_release+0/16] kobject_release+0x0/0x10 Sep 3 18:40:35 hydra kernel: [<e105e3f1>] scsi_request_fn+0x261/0x400 [scsi_mod] Sep 3 18:40:35 hydra kernel: [__make_request+170/1184] __make_request+0xaa/0x4a0 Sep 3 18:40:35 hydra kernel: [schedule+1013/1840] schedule+0x3f5/0x730 Sep 3 18:40:35 hydra kernel: [generic_make_request+240/352] generic_make_request+0xf0/0x160 Sep 3 18:40:35 hydra kernel: [__bio_clone+166/176] __bio_clone+0xa6/0xb0 Sep 3 18:40:35 hydra kernel: [submit_bio+98/256] submit_bio+0x62/0x100 Sep 3 18:40:35 hydra kernel: [<e108f728>] md_super_write+0xa8/0xe0 [md_mod] Sep 3 18:40:35 hydra kernel: [<e10919a6>] md_update_sb+0x1b6/0x230 [md_mod] Sep 3 18:40:35 hydra kernel: [<e1097793>] md_check_recovery+0x463/0x4d0 [md_mod] Sep 3 18:40:35 hydra kernel: [schedule_timeout+169/176] schedule_timeout+0xa9/0xb0 Sep 3 18:40:35 hydra kernel: [<e1085bb6>] raid5d+0x16/0x190 [raid5] Sep 3 18:40:35 hydra kernel: [prepare_to_wait+32/112] prepare_to_wait+0x20/0x70 Sep 3 18:40:35 hydra kernel: [<e109577f>] md_thread+0x5f/0x130 [md_mod] Sep 3 18:40:35 hydra kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 Sep 3 18:40:35 hydra kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 Sep 3 18:40:35 hydra kernel: [<e1095720>] md_thread+0x0/0x130 [md_mod] Sep 3 18:40:35 hydra kernel: [kthread+186/192] kthread+0xba/0xc0 Sep 3 18:40:35 hydra kernel: [kthread+0/192] kthread+0x0/0xc0 Sep 3 18:40:35 hydra kernel: [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10 Does anyone on this list know what is going on and why this would occur? I have a work-around that breaks the SQL query into a set of smaller queries which don't then cause this problem, but I would like to get to the root cause and fix the problem properly. Thanks in advance for any help anyone can give on this. > -----Original Message----- > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users- > bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Roger Lucas > Sent: 03 September 2006 14:47 > To: xen-users@xxxxxxxxxxxxxxxxxxx > Subject: RE: soft lockup was (Re: [Xen-users] Kernel error) > > I have suddenly got these same errors occurring on my Xen-3.0.2-2 system. > I > have four DomUs with 256MB ram each and 512MB on the Dom0 running on an > Intel Celeron system. > > Is the only solution to upgrade to Unstable, or is there a patch/upgrade > available for the 3.0.2-2 release? > > Thanks, Roger. > > > -----Original Message----- > > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users- > > bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Steve Traugott > > Sent: 02 August 2006 23:22 > > To: Jones, Chris > > Cc: rbp@xxxxxxxxxxxxx; xen-users@xxxxxxxxxxxxxxxxxxx > > Subject: soft lockup was (Re: [Xen-users] Kernel error) > > > > Hi Chris, > > > > Did you ever reach any sort of conclusion about the current state of > > the soft lockup bug? Do you have a stable build now? What changeset > > is it? > > > > Thanks, > > > > Steve > > > > On Fri, Jul 07, 2006 at 07:37:33AM -0500, Jones, Chris wrote: > > > I am getting the same errors in the stable 3.0.2 but I am not getting > > > the errors on unstable so it looks like you are right. I am > downloading > > > the testing tree in an attempt to test it there. I will holler when I > > > find something out. > > > > > > -----Original Message----- > > > From: Rodrigo Borges Pereira [mailto:rbp@xxxxxxxxxxxxx] > > > Sent: Friday, July 07, 2006 7:23 AM > > > To: Jones, Chris; xen-users@xxxxxxxxxxxxxxxxxxx > > > Subject: RE: [Xen-users] Kernel error > > > > > > I believe that thread states that the fix is already in 3.0.2. And i > am > > > running 3.0.2. > > > Did i get it wrong? > > > > > > tks > > > > > > > -----Original Message----- > > > > From: Jones, Chris [mailto:chris.jones@xxxxxxxxxxxxxxx] > > > > Sent: sexta-feira, 7 de Julho de 2006 13:18 > > > > To: rbp@xxxxxxxxxxxxx; xen-users@xxxxxxxxxxxxxxxxxxx > > > > Subject: RE: [Xen-users] Kernel error > > > > > > > > There is a fix for this issue. > > > > http://lists.xensource.com/archives/html/xen-devel/2006-04/msg > > > 00193.html > > > > > > > > > > > > -----Original Message----- > > > > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx > > > > [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of > > > > Rodrigo Borges Pereira > > > > Sent: Friday, July 07, 2006 7:06 AM > > > > To: xen-users@xxxxxxxxxxxxxxxxxxx > > > > Subject: [Xen-users] Kernel error > > > > > > > > Hi, > > > > > > > > I got this on the console of one DomU: > > > > > > > > --> BUG: soft lockup detected on CPU#0! > > > > > > > > Pid: 0, comm: swapper > > > > EIP: 0061:[<c01013a7>] CPU: 0 > > > > EIP is at 0xc01013a7 > > > > EFLAGS: 00000246 Tainted: GF (2.6.16-xen3_86.1_rhel4.1 #1) > > > > EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00004eaf > > > > ESI: 00000000 EDI: 00000001 EBP: c03e4000 DS: 007b ES: 007b > > > > CR0: 8005003b CR2: 8005230c CR3: 004ec000 CR4: 00000640 > > > > [<c0102b53>] xen_idle+0x53/0xb0 [<c0102c1f>] > > > > cpu_idle+0x6f/0xe0 [<c03e69da>] start_kernel+0x1da/0x230 > > > > [<c03e6320>] unknown_bootoption+0x0/0x1f0 > > > > > > > > > > > > It didn't seem to affect the operation of either DomU or Dom0. > > > > > > > > Should i worry? > > > > > > > > Best regards, > > > > r > > > > > > > > > > > > _______________________________________________ > > > > Xen-users mailing list > > > > Xen-users@xxxxxxxxxxxxxxxxxxx > > > > http://lists.xensource.com/xen-users > > > > > > > > > > > > > _______________________________________________ > > > Xen-users mailing list > > > Xen-users@xxxxxxxxxxxxxxxxxxx > > > http://lists.xensource.com/xen-users > > > > -- > > Stephen G. Traugott (KG6HDQ) > > UNIX/Linux Infrastructure Architect, TerraLuna LLC > > stevegt@xxxxxxxxxxxxx > > http://www.stevegt.com -- http://Infrastructures.Org > > > > _______________________________________________ > > Xen-users mailing list > > Xen-users@xxxxxxxxxxxxxxxxxxx > > http://lists.xensource.com/xen-users > > > _______________________________________________ > Xen-users mailing list > Xen-users@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-users _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |