[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: soft lockup was (Re: [Xen-users] Kernel error)


  • To: <xen-users@xxxxxxxxxxxxxxxxxxx>
  • From: "Roger Lucas" <roger@xxxxxxxxxxxxx>
  • Date: Mon, 4 Sep 2006 09:48:16 +0100
  • Delivery-date: Mon, 04 Sep 2006 01:51:15 -0700
  • List-id: Xen user discussion <xen-users.lists.xensource.com>
  • Thread-index: Aca2gljkdjaQ5026RmmoycpiT/cp2QY3FdJwACenwGA=

This soft-lockup problem seems to occur when I perform a large MySQL query
that takes several seconds to complete on a DomU.  At this point, the soft
lockup message appears and the Xen box seems to stall for about 5-10
seconds.  After that, everything continues normally again.

The box is an Abit-LG81 motherboard (Skt775, ICH7) with an Intel Celeron
2.7GHz processor and 2 GB of RAM.  I am running software RAID-5 across the 4
SATA drives in Dom0 and providing the disks to the DomUs using LVM.  The
basic installation was Kubuntu Dapper Drake 6.06 and I installed the Xen
kernel from the 3.0.2-2 binaries on the Xen site.

A capture of the relevant information from syslog is below.  This is what I
get for most of the errors:

Sep  3 15:48:20 hydra kernel: Pid: 0, comm:              swapper
Sep  3 15:48:20 hydra kernel: EIP: 0061:[hypercall_page+935/4096] CPU: 0
Sep  3 15:48:20 hydra kernel: EIP is at 0xc01013a7
Sep  3 15:48:20 hydra kernel:  EFLAGS: 00000296    Tainted: GF
(2.6.16-xen #1)
Sep  3 15:48:20 hydra kernel: EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX:
00001f8e
Sep  3 15:48:20 hydra kernel: ESI: 00000000 EDI: 00000001 EBP: c03da000 DS:
007b ES: 007b
Sep  3 15:48:20 hydra kernel: CR0: 8005003b CR2: b7e34000 CR3: 00df6000 CR4:
00000640
Sep  3 15:48:20 hydra kernel:  [xen_idle+83/176] xen_idle+0x53/0xb0
Sep  3 15:48:20 hydra kernel:  [cpu_idle+122/224] cpu_idle+0x7a/0xe0
Sep  3 15:48:20 hydra kernel:  [start_kernel+439/512]
start_kernel+0x1b7/0x200
Sep  3 15:48:20 hydra kernel:  [unknown_bootoption+0/464]
unknown_bootoption+0x0/0x1d0

Sometimes I get a longer trace like the one below.  The exact trace varies a
bit but the starting function is always "notify_remote_via_irq":

Sep  3 16:09:30 hydra kernel: BUG: soft lockup detected on CPU#0!
Sep  3 16:09:30 hydra kernel:
Sep  3 16:09:30 hydra kernel: Pid: 0, comm:              swapper
Sep  3 16:09:30 hydra kernel: EIP: 0061:[hypercall_page+519/4096] CPU: 0
Sep  3 16:09:30 hydra kernel: EIP is at 0xc0101207
Sep  3 16:09:30 hydra kernel:  EFLAGS: 00000202    Tainted: GF
(2.6.16-xen #1)
Sep  3 16:09:30 hydra kernel: EAX: 00000000 EBX: c03dbc98 ECX: c114ed40 EDX:
c03dbef4
Sep  3 16:09:30 hydra kernel: ESI: 00000000 EDI: 00000112 EBP: c0432fc0 DS:
007b ES: 007b
Sep  3 16:09:30 hydra kernel: CR0: 8005003b CR2: b7e2c4b0 CR3: 00df6000 CR4:
00000640
Sep  3 16:09:30 hydra kernel:  [notify_remote_via_irq+41/64]
notify_remote_via_irq+0x29/0x40
Sep  3 16:09:30 hydra kernel:  [kfree_skbmem+94/144] kfree_skbmem+0x5e/0x90
Sep  3 16:09:30 hydra kernel:  [net_rx_action+1123/1280]
net_rx_action+0x463/0x500
Sep  3 16:09:30 hydra kernel:  [fib_lookup+209/320] fib_lookup+0xd1/0x140
Sep  3 16:09:30 hydra kernel:  [ip_route_input_slow+440/2528]
ip_route_input_slow+0x1b8/0x9e0
Sep  3 16:09:30 hydra kernel:  [try_to_wake_up+768/880]
try_to_wake_up+0x300/0x370
Sep  3 16:09:30 hydra kernel:  [<e13c5000>] br_forward_finish+0x0/0x70
[bridge]
Sep  3 16:09:30 hydra kernel:  [neigh_lookup+136/208] neigh_lookup+0x88/0xd0
Sep  3 16:09:30 hydra kernel:  [kfree_skbmem+94/144] kfree_skbmem+0x5e/0x90
Sep  3 16:09:30 hydra kernel:  [arp_process+142/1456] arp_process+0x8e/0x5b0
Sep  3 16:09:30 hydra kernel:  [ip_local_deliver+280/688]
ip_local_deliver+0x118/0x2b0
Sep  3 16:09:30 hydra kernel:  [arp_rcv+221/400] arp_rcv+0xdd/0x190
Sep  3 16:09:30 hydra kernel:  [packet_rcv_spkt+359/672]
packet_rcv_spkt+0x167/0x2a0
Sep  3 16:09:30 hydra kernel:  [netif_receive_skb+650/816]
netif_receive_skb+0x28a/0x330
Sep  3 16:09:30 hydra kernel:  [process_backlog+215/400]
process_backlog+0xd7/0x190
Sep  3 16:09:30 hydra kernel:  [tasklet_action+157/320]
tasklet_action+0x9d/0x140
Sep  3 16:09:30 hydra kernel:  [__do_softirq+245/288]
__do_softirq+0xf5/0x120
Sep  3 16:09:30 hydra kernel:  [do_softirq+149/160] do_softirq+0x95/0xa0
Sep  3 16:09:30 hydra kernel:  [do_IRQ+31/48] do_IRQ+0x1f/0x30
Sep  3 16:09:30 hydra kernel:  [evtchn_do_upcall+168/240]
evtchn_do_upcall+0xa8/0xf0
Sep  3 16:09:30 hydra kernel:  [hypervisor_callback+44/52]
hypervisor_callback+0x2c/0x34
Sep  3 16:09:30 hydra kernel:  [xen_idle+83/176] xen_idle+0x53/0xb0
Sep  3 16:09:30 hydra kernel:  [cpu_idle+122/224] cpu_idle+0x7a/0xe0
Sep  3 16:09:30 hydra kernel:  [start_kernel+439/512]
start_kernel+0x1b7/0x200

Just once, I got the following error:

Sep  3 18:40:35 hydra kernel: Pid: 2268, comm:            md0_raid5
Sep  3 18:40:35 hydra kernel: EIP: 0061:[hypercall_page+551/4096] CPU: 0
Sep  3 18:40:35 hydra kernel: EIP is at 0xc0101227
Sep  3 18:40:35 hydra kernel:  EFLAGS: 00200246    Tainted: GF
(2.6.16-xen #1)
Sep  3 18:40:35 hydra kernel: EAX: 00030000 EBX: 00000000 ECX: 00000000 EDX:
c0619c2c
Sep  3 18:40:35 hydra kernel: ESI: c0619b30 EDI: c0619b40 EBP: 00000001 DS:
007b ES: 007b
Sep  3 18:40:35 hydra kernel: CR0: 8005003b CR2: b7e34000 CR3: 003f2000 CR4:
00000640
Sep  3 18:40:35 hydra kernel:  [force_evtchn_callback+10/16]
force_evtchn_callback+0xa/0x10
Sep  3 18:40:35 hydra kernel:  [get_request+727/800] get_request+0x2d7/0x320
Sep  3 18:40:35 hydra kernel:  [lock_timer_base+36/80]
lock_timer_base+0x24/0x50
Sep  3 18:40:35 hydra kernel:  [get_request_wait+44/368]
get_request_wait+0x2c/0x170
Sep  3 18:40:35 hydra kernel:  [blk_plug_device+99/160]
blk_plug_device+0x63/0xa0
Sep  3 18:40:35 hydra kernel:  [kobject_put+31/48] kobject_put+0x1f/0x30
Sep  3 18:40:35 hydra kernel:  [kobject_release+0/16]
kobject_release+0x0/0x10
Sep  3 18:40:35 hydra kernel:  [<e105e3f1>] scsi_request_fn+0x261/0x400
[scsi_mod]
Sep  3 18:40:35 hydra kernel:  [__make_request+170/1184]
__make_request+0xaa/0x4a0
Sep  3 18:40:35 hydra kernel:  [schedule+1013/1840] schedule+0x3f5/0x730
Sep  3 18:40:35 hydra kernel:  [generic_make_request+240/352]
generic_make_request+0xf0/0x160
Sep  3 18:40:35 hydra kernel:  [__bio_clone+166/176] __bio_clone+0xa6/0xb0
Sep  3 18:40:35 hydra kernel:  [submit_bio+98/256] submit_bio+0x62/0x100
Sep  3 18:40:35 hydra kernel:  [<e108f728>] md_super_write+0xa8/0xe0
[md_mod]
Sep  3 18:40:35 hydra kernel:  [<e10919a6>] md_update_sb+0x1b6/0x230
[md_mod]
Sep  3 18:40:35 hydra kernel:  [<e1097793>] md_check_recovery+0x463/0x4d0
[md_mod]
Sep  3 18:40:35 hydra kernel:  [schedule_timeout+169/176]
schedule_timeout+0xa9/0xb0
Sep  3 18:40:35 hydra kernel:  [<e1085bb6>] raid5d+0x16/0x190 [raid5]
Sep  3 18:40:35 hydra kernel:  [prepare_to_wait+32/112]
prepare_to_wait+0x20/0x70
Sep  3 18:40:35 hydra kernel:  [<e109577f>] md_thread+0x5f/0x130 [md_mod]
Sep  3 18:40:35 hydra kernel:  [autoremove_wake_function+0/96]
autoremove_wake_function+0x0/0x60
Sep  3 18:40:35 hydra kernel:  [autoremove_wake_function+0/96]
autoremove_wake_function+0x0/0x60
Sep  3 18:40:35 hydra kernel:  [<e1095720>] md_thread+0x0/0x130 [md_mod]
Sep  3 18:40:35 hydra kernel:  [kthread+186/192] kthread+0xba/0xc0
Sep  3 18:40:35 hydra kernel:  [kthread+0/192] kthread+0x0/0xc0
Sep  3 18:40:35 hydra kernel:  [kernel_thread_helper+5/16]
kernel_thread_helper+0x5/0x10

Does anyone on this list know what is going on and why this would occur?

I have a work-around that breaks the SQL query into a set of smaller queries
which don't then cause this problem, but I would like to get to the root
cause and fix the problem properly.

Thanks in advance for any help anyone can give on this.



> -----Original Message-----
> From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users-
> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Roger Lucas
> Sent: 03 September 2006 14:47
> To: xen-users@xxxxxxxxxxxxxxxxxxx
> Subject: RE: soft lockup was (Re: [Xen-users] Kernel error)
> 
> I have suddenly got these same errors occurring on my Xen-3.0.2-2 system.
> I
> have four DomUs with 256MB ram each and 512MB on the Dom0 running on an
> Intel Celeron system.
> 
> Is the only solution to upgrade to Unstable, or is there a patch/upgrade
> available for the 3.0.2-2 release?
> 
> Thanks, Roger.
> 
> > -----Original Message-----
> > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-users-
> > bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Steve Traugott
> > Sent: 02 August 2006 23:22
> > To: Jones, Chris
> > Cc: rbp@xxxxxxxxxxxxx; xen-users@xxxxxxxxxxxxxxxxxxx
> > Subject: soft lockup was (Re: [Xen-users] Kernel error)
> >
> > Hi Chris,
> >
> > Did you ever reach any sort of conclusion about the current state of
> > the soft lockup bug?  Do you have a stable build now?  What changeset
> > is it?
> >
> > Thanks,
> >
> > Steve
> >
> > On Fri, Jul 07, 2006 at 07:37:33AM -0500, Jones, Chris wrote:
> > > I am getting the same errors in the stable 3.0.2 but I am not getting
> > > the errors on unstable so it looks like you are right. I am
> downloading
> > > the testing tree in an attempt to test it there. I will holler when I
> > > find something out.
> > >
> > > -----Original Message-----
> > > From: Rodrigo Borges Pereira [mailto:rbp@xxxxxxxxxxxxx]
> > > Sent: Friday, July 07, 2006 7:23 AM
> > > To: Jones, Chris; xen-users@xxxxxxxxxxxxxxxxxxx
> > > Subject: RE: [Xen-users] Kernel error
> > >
> > > I believe that thread states that the fix is already in 3.0.2. And i
> am
> > > running 3.0.2.
> > > Did i get it wrong?
> > >
> > > tks
> > >
> > > > -----Original Message-----
> > > > From: Jones, Chris [mailto:chris.jones@xxxxxxxxxxxxxxx]
> > > > Sent: sexta-feira, 7 de Julho de 2006 13:18
> > > > To: rbp@xxxxxxxxxxxxx; xen-users@xxxxxxxxxxxxxxxxxxx
> > > > Subject: RE: [Xen-users] Kernel error
> > > >
> > > > There is a fix for this issue.
> > > > http://lists.xensource.com/archives/html/xen-devel/2006-04/msg
> > > 00193.html
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: xen-users-bounces@xxxxxxxxxxxxxxxxxxx
> > > > [mailto:xen-users-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of
> > > > Rodrigo Borges Pereira
> > > > Sent: Friday, July 07, 2006 7:06 AM
> > > > To: xen-users@xxxxxxxxxxxxxxxxxxx
> > > > Subject: [Xen-users] Kernel error
> > > >
> > > > Hi,
> > > >
> > > > I got this on the console of one DomU:
> > > >
> > > > --> BUG: soft lockup detected on CPU#0!
> > > >
> > > > Pid: 0, comm:              swapper
> > > > EIP: 0061:[<c01013a7>] CPU: 0
> > > > EIP is at 0xc01013a7
> > > >  EFLAGS: 00000246    Tainted: GF      (2.6.16-xen3_86.1_rhel4.1 #1)
> > > > EAX: 00000000 EBX: 00000001 ECX: 00000000 EDX: 00004eaf
> > > > ESI: 00000000 EDI: 00000001 EBP: c03e4000 DS: 007b ES: 007b
> > > > CR0: 8005003b CR2: 8005230c CR3: 004ec000 CR4: 00000640
> > > > [<c0102b53>] xen_idle+0x53/0xb0  [<c0102c1f>]
> > > > cpu_idle+0x6f/0xe0  [<c03e69da>] start_kernel+0x1da/0x230
> > > > [<c03e6320>] unknown_bootoption+0x0/0x1f0
> > > >
> > > >
> > > > It didn't seem to affect the operation of either DomU or Dom0.
> > > >
> > > > Should i worry?
> > > >
> > > > Best regards,
> > > > r
> > > >
> > > >
> > > > _______________________________________________
> > > > Xen-users mailing list
> > > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.xensource.com/xen-users
> > > >
> > >
> > >
> > > _______________________________________________
> > > Xen-users mailing list
> > > Xen-users@xxxxxxxxxxxxxxxxxxx
> > > http://lists.xensource.com/xen-users
> >
> > --
> > Stephen G. Traugott  (KG6HDQ)
> > UNIX/Linux Infrastructure Architect, TerraLuna LLC
> > stevegt@xxxxxxxxxxxxx
> > http://www.stevegt.com -- http://Infrastructures.Org
> >
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/xen-users
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-users


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.