[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] BUG: soft lockup


  • To: Dana Rawding <dana@xxxxxxxxxxx>
  • From: alex <alex.faq8@xxxxxxxxx>
  • Date: Tue, 2 Feb 2010 23:58:07 +0300
  • Cc: Xen List <xen-users@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Tue, 02 Feb 2010 12:59:32 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=txBjvpSjmfajKMtClWMouDGVv6LkHaZ28vFgmc8httpCcemwPIF9NQLvIUoSioxuqI sEMyuM7t2C9sr6jyh/e9JhyFwQ+RM7m43cHgAKKxArz1buYcEIvffiVJ/1R6oYsRO29j qDc5Ty095zS7/HS1dFExQhXSaIiYcSR11ztSk=
  • List-id: Xen user discussion <xen-users.lists.xensource.com>

I have this problem too.
Xen 3.3.1 Debian Lenny.
LA on server up to 10-15, all domUs freeze and I can't do anything.
Please test I fix this problem by xm sched-credit -d 0 -w 512 .

[787717.425090] BUG: soft lockup - CPU#0 stuck for 61s! [watchdog/0:5]
[787717.425090] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_tables tun bridge ipv6 nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc loop joydev igb psmouse pcspkr i2c_i801 serio_raw button i2c_core evdev dca ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod sg sr_mod cdrom ata_generic usbhid hid ff_memless ata_piix libata dock sd_mod ide_pci_generic ide_core ehci_hcd uhci_hcd 3w_9xxx scsi_mod thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
[787717.432148] CPU 0:
[787717.432148] Modules linked in: xt_tcpudp xt_physdev iptable_filter ip_tables x_tables tun bridge ipv6 nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc loop joydev igb psmouse pcspkr i2c_i801 serio_raw button i2c_core evdev dca ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod sg sr_mod cdrom ata_generic usbhid hid ff_memless ata_piix libata dock sd_mod ide_pci_generic ide_core ehci_hcd uhci_hcd 3w_9xxx scsi_mod thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
[787717.436173] Pid: 5, comm: watchdog/0 Not tainted 2.6.26-1-xen-amd64 #1
[787717.436173] RIP: e030:[<ffffffff8025ed13>]  [<ffffffff8025ed13>] watchdog+0xbe/0x1cf
[787717.436173] RSP: e02b:ffff880bce0d9ef0  EFLAGS: 00000207
[787717.436173] RAX: 0000000000000001 RBX: ffff880bcb4e5400 RCX: 0002cc64939f91fe
[787717.436173] RDX: ffff880081656000 RSI: ffffffff804fe460 RDI: ffffffff8053a000
[787717.436173] RBP: ffff880bcb4e5400 R08: ffff880001be3040 R09: ffff880bce0d9e30
[787717.436173] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000399
[787717.436173] R13: 00000000000b3192 R14: 0000000000000000 R15: 0000000000000000
[787717.436173] FS:  00007f0cfbb3e6e0(0000) GS:ffffffff80539000(0000) knlGS:0000000000000000
[787717.436173] CS:  e033 DS: 0000 ES: 0000
[787717.436173] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[787717.436173] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[787717.436173]
[787717.436173] Call Trace:
[787717.436173]  [<ffffffff8025ec55>] ? watchdog+0x0/0x1cf
[787717.436173]  [<ffffffff8023f56b>] ? kthread+0x47/0x74
[787717.436173]  [<ffffffff8022839f>] ? schedule_tail+0x27/0x5c
[787717.436173]  [<ffffffff8020be28>] ? child_rip+0xa/0x12
[787717.436173]  [<ffffffff8023f524>] ? kthread+0x0/0x74
[787717.436173]  [<ffffffff8020be1e>] ? child_rip+0x0/0x12
[787717.436173]



I fix this problem by xm sched-credit -d 0 -w 512 .


2010/1/31 Dana Rawding <dana@xxxxxxxxxxx>
Hi all,

I've been experiencing a rash of CPU lockups on a number of domU's recently.  It's been happening on two different servers.  About a year ago I had this problem every once in a while but it was not frequent.  I was running Ubuntu with Xen 3.1 and 2.6.24-18 back then.  I'm now running Xen 3.3 and 2.6.24-26.

What I have noticed is that just prior to the lockups the domU's had high cpu loads.  The domU that I have the most problems with is a Zimbra server.  My guess is that a rash of spam comes through and cpu loads get high, then the cpu's lock up.  Originally I had it running with 1 cpu but have since upped it 2 then 3 cpu's.

I have been collecting the lockup messages and have posed a few below.  Any ideas?  Recommendations?

Thanks,
Dana


[138077.172283]  =======================
[138075.147398] BUG: soft lockup - CPU#0 stuck for 11s! [kswapd0:97]
[138075.147411]
[138075.147419] Pid: 97, comm: kswapd0 Tainted: G      D (2.6.24-26-xen #1)
[138075.147426] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 0
[138075.147441] EIP is at _spin_lock+0x7/0x10
[138075.147447] EAX: c1da48ec EBX: 00000000 ECX: 220c7000 EDX: 00000000
[138075.147453] ESI: 8b804067 EDI: c1da48ec EBP: 00000f28 ESP: ed707dec
[138075.147459]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[138075.147471] CR0: 8005003b CR2: 080f0010 CR3: 2213b000 CR4: 00000660
[138075.147482] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[138075.147488] DR6: ffff0ff0 DR7: 00000400
[138075.147495]  [<c01773cb>] page_check_address+0x1cb/0x3c0
[138075.147514]  [<c0119868>] xen_invlpg_mask+0x38/0x40
[138075.147529]  [<c017762e>] page_referenced_one+0x6e/0x190
[138075.147541]  [<c017875c>] page_referenced+0xec/0x130
[138075.147552]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
[138075.147567]  [<c016826d>] shrink_zone+0xdd/0x100
[138075.147578]  [<c01688cc>] kswapd+0x44c/0x490
[138075.147589]  [<c013bb00>] autoremove_wake_function+0x0/0x40
[138075.147603]  [<c011e270>] complete+0x40/0x60
[138075.147614]  [<c0168480>] kswapd+0x0/0x490
[138075.147625]  [<c013b842>] kthread+0x42/0x70
[138075.147635]  [<c013b800>] kthread+0x0/0x70
[138075.147646]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
[138075.147658]  =======================
[138088.987826] BUG: soft lockup - CPU#1 stuck for 11s! [java:23215]
[138088.987841]
[138088.987846] Pid: 23215, comm: java Tainted: G      D (2.6.24-26-xen #1)
[138088.987850] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 1
[138088.987862] EIP is at _spin_lock+0x7/0x10
[138088.987866] EAX: c1da48ec EBX: 00000000 ECX: c1da48e0 EDX: 00000ca8
[138088.987870] ESI: 8b804067 EDI: 00000000 EBP: e20c7ca8 ESP: e226be04
[138088.987873]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[138088.987883] CR0: 80050033 CR2: 940ef020 CR3: 2211f000 CR4: 00000660
[138088.987891] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[138088.987896] DR6: ffff0ff0 DR7: 00000400
[138088.987901]  [<c016d88d>] unmap_vmas+0x43d/0xae0
[138088.987922]  [<c011959c>] kmap_atomic+0x1c/0x30
[138088.987941]  [<c01192fd>] kunmap_atomic+0x3d/0x60
[138088.987957]  [<c0173ee8>] vma_adjust+0x1c8/0x440
[138088.987967]  [<c0173765>] unmap_region+0x95/0x120
[138088.987975]  [<c0174387>] do_munmap+0x147/0x1f0
[138088.987983]  [<c0174c90>] mmap_region+0x70/0x450
[138088.987991]  [<c01db3b7>] security_file_mmap+0x27/0x30
[138088.988001]  [<c0175472>] do_mmap_pgoff+0x312/0x330
[138088.988008]  [<c010a02b>] sys_mmap2+0xbb/0xd0
[138088.988016]  [<c0105832>] syscall_call+0x7/0xb
[138088.988023]  [<c0320000>] svc_accept+0x150/0x410
[138088.988032]  =======================


[66916.451144] BUG: soft lockup - CPU#0 stuck for 11s! [java:2758]
[66928.193453] BUG: soft lockup - CPU#1 stuck for 11s! [java:3419]


[336990.703192] BUG: soft lockup - CPU#1 stuck for 11s! [ps:32586]
[336990.703206]
[336990.703214] Pid: 32586, comm: ps Tainted: G      D (2.6.24-26-xen #1)
[336990.703221] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 1
[336990.703235] EIP is at _spin_lock+0x7/0x10
[336990.703241] EAX: c1dbc72c EBX: 00000000 ECX: c1dbc720 EDX: 00000007
[336990.703247] ESI: 57b51067 EDI: 00000001 EBP: e2cb93c8 ESP: e2033e4c
[336990.703253]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[336990.703266] CR0: 80050033 CR2: 08079004 CR3: 23651000 CR4: 00000660
[336990.703275] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[336990.703282] DR6: ffff0ff0 DR7: 00000400
[336990.703288]  [<c0171646>] handle_mm_fault+0xae6/0x1360
[336990.703307]  [<c020e057>] rb_insert_color+0x77/0xe0
[336990.703325]  [<c032a27e>] do_page_fault+0x35e/0xe70
[336990.703337]  [<c01745d4>] vma_merge+0x144/0x1d0
[336990.703349]  [<c0174b75>] do_brk+0x195/0x240
[336990.703362]  [<c0175126>] sys_brk+0xb6/0xf0
[336990.703374]  [<c0329f20>] do_page_fault+0x0/0xe70
[336990.703384]  [<c0328bc5>] error_code+0x35/0x40
[336990.703396]  =======================
[337005.938292] BUG: soft lockup - CPU#2 stuck for 11s! [zmlocalconfig:11371]
[337005.938306]
[337005.938312] Pid: 11371, comm: zmlocalconfig Tainted: G      D (2.6.24-26-xen #1)
[337005.938318] EIP: 0061:[<c03286e7>] EFLAGS: 00000286 CPU: 2
[337005.938330] EIP is at _spin_lock+0x7/0x10
[337005.938335] EAX: ec64a870 EBX: ec64a870 ECX: 00000002 EDX: ec64a871
[337005.938339] ESI: 00000000 EDI: c03fe800 EBP: c1261e38 ESP: c1261d7c
[337005.938343]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[337005.938357] CR0: 8005003b CR2: 08128000 CR3: 25d8e000 CR4: 00000660
[337005.938364] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[337005.938370] DR6: ffff0ff0 DR7: 00000400
[337005.938376]  [<c01771f0>] page_lock_anon_vma+0x20/0x30
[337005.938391]  [<c01786fd>] page_referenced+0x8d/0x130
[337005.938401]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
[337005.938411]  [<c0164286>] get_dirty_limits+0x16/0x200
[337005.938421]  [<ee04b38e>] mb_cache_shrink_fn+0x1e/0x100 [mbcache]
[337005.938435]  [<c016826d>] shrink_zone+0xdd/0x100
[337005.938444]  [<c0168d72>] try_to_free_pages+0x152/0x250
[337005.938453]  [<c0162fcb>] __alloc_pages+0x14b/0x390
[337005.938463]  [<c01855c5>] do_sync_read+0xd5/0x120
[337005.938475]  [<c0163247>] __get_free_pages+0x37/0x50
[337005.938483]  [<c0124496>] copy_process+0xa6/0x1210
[337005.938493]  [<c0197c34>] d_alloc+0x114/0x1a0
[337005.938503]  [<c0125830>] do_fork+0x40/0x260
[337005.938511]  [<c0210f00>] copy_to_user+0x30/0x60
[337005.938523]  [<c0103226>] sys_clone+0x36/0x40
[337005.938530]  [<c0105832>] syscall_call+0x7/0xb
[337005.938542]  =======================
[336990.803889] BUG: soft lockup - CPU#0 stuck for 11s! [kswapd0:103]
[336990.803907]
[336990.803915] Pid: 103, comm: kswapd0 Tainted: G      D (2.6.24-26-xen #1)
[336990.803922] EIP: 0061:[<c03286ea>] EFLAGS: 00000286 CPU: 0
[336990.803940] EIP is at _spin_lock+0xa/0x10
[336990.803948] EAX: c1dbc86c EBX: 00000000 ECX: 22cc3000 EDX: 00000000
[336990.803955] ESI: 57b47067 EDI: c1dbc86c EBP: 00000ff0 ESP: ed725dec
[336990.803961]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
[336990.803976] CR0: 8005003b CR2: b791e6d9 CR3: 23e3b000 CR4: 00000660
[336990.803986] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[336990.803992] DR6: ffff0ff0 DR7: 00000400
[336990.804001]  [<c01773cb>] page_check_address+0x1cb/0x3c0
[336990.804026]  [<c017762e>] page_referenced_one+0x6e/0x190
[336990.804039]  [<c017875c>] page_referenced+0xec/0x130
[336990.804049]  [<c01671cf>] shrink_active_list+0x18f/0x5c0
[336990.804064]  [<c0210556>] memmove+0x36/0x40
[336990.804079]  [<c0164286>] get_dirty_limits+0x16/0x200
[336990.804089]  [<c0139857>] call_rcu+0x97/0xa0
[336990.804102]  [<ee04b38e>] mb_cache_shrink_fn+0x1e/0x100 [mbcache]
[336990.804120]  [<c016826d>] shrink_zone+0xdd/0x100
[336990.804132]  [<c01688cc>] kswapd+0x44c/0x490
[336990.804145]  [<c013bb00>] autoremove_wake_function+0x0/0x40
[336990.804160]  [<c011e270>] complete+0x40/0x60
[336990.804172]  [<c0168480>] kswapd+0x0/0x490
[336990.804183]  [<c013b842>] kthread+0x42/0x70
[336990.804194]  [<c013b800>] kthread+0x0/0x70
[336990.804206]  [<c0105bb7>] kernel_thread_helper+0x7/0x10
[336990.804218]  =======================
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users



--
Best Regards,
alex.faq8@xxxxxxxxx


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.