[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
> > On 09/02/2010 05:20 PM, James Harper wrote: > > I've had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]" happen > > twice now and the server fairly quickly (over the course of 5-10 > > minutes) becomes unusable after that happens. The call trace appears to > > be: > > > > Call Trace: > > [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa > > [<ffffffff8100ef72>] ? check_events+0x12/0x20 > > [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7 > > [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b > > [<ffffffff812b39a5>] ? page_fault+0x25/0x30 > > [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30 > > [<ffffffff81045265>] ? schedule_tail+0x92/0x96 > > [<ffffffff81011983>] ? ret_from_fork+0x13/0x80 > > > > That was under 2.6.32.17-g69a73fa. I've since upgraded to > > 2.6.32.18-ge6b9b2c but the changelogs don't show anything obviously > > related to the crash (but as usual there are a lot of them so maybe I've > > missed something?) > > > > Is this a known problem under 2.6.32.17-g69a73fa that has since been > > fixed? > > Hard to know; nothing springs to mind right now. What else is going on > at the time? What's the full softlockup message? Is there any more > context? I just noticed that it coincided pretty much exactly when a backup runs. I take a snapshot in Dom0 then block-attach it to the linux DomU so it can be backed up from DomU which makes restores easier etc, rather than doing the backup in Dom0. I hadn't noticed before because it's the Windows DomU that people complain about first, not the linux DomU. Block-attach/detach would explain why udevd is involved too which makes more sense. The block-attach happens every hour on the hour. The full cut&paste of the first hang is: Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref 1442, event-channel 13, protocol 1 (x86_64-abi) Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULO G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id e_core uhci_hc Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: scsi_wait_scan] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1: Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULO G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id e_core uhci_hc Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: scsi_wait_scan] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm: udevd Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP: e030:[<ffffffff8100922a>] [<ffffffff8100922a>] hypercall_page+0x22a/0x1001 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP: e02b:ffff8800114d5dd0 EFLAGS: 00000246 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffffff8100922a keechsvr2:/usr/local/src/linux-2.6-xen# head -100 /var/log/kern.log Sep 3 07:00:44 keechsvr2 kernel: [228694.767691] blkback: ring-ref 311, event-channel 13, protocol 1 (x86_64-abi) Sep 3 08:00:42 keechsvr2 kernel: [232293.072248] blkback: ring-ref 1334, event-channel 13, protocol 1 (x86_64-abi) Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref 1442, event-channel 13, protocol 1 (x86_64-abi) Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULOG xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd ide_core uhci_hc Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: scsi_wait_scan] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1: Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULOG xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd ide_core uhci_hc Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: scsi_wait_scan] Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm: udevd Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP: e030:[<ffffffff8100922a>] [<ffffffff8100922a>] hypercall_page+0x22a/0x1001 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP: e02b:ffff8800114d5dd0 EFLAGS: 00000246 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX: 0000000000040000 RBX: 0000000000000001 RCX: ffffffff8100922a Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RDX: ffffffff814293d0 RSI: 0000000000000000 RDI: 0000000000000000 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RBP: 00007fbc628a9800 R08: 0000000000000000 R09: 0000000000000000 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000002 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R13: ffff8800114d5e88 R14: ffff880017f9cfc0 R15: ffff880017eed500 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] FS: 00007fbc628a9770(0000) GS:ffff880002de8000(0000) knlGS:0000000000000000 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CR2: 00007fbc628a9800 CR3: 000000001e2d7000 CR4: 0000000000002660 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Call Trace: Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100ef72>] ? check_events+0x12/0x20 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff812b39a5>] ? page_fault+0x25/0x30 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff81045265>] ? schedule_tail+0x92/0x96 Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff81011983>] ? ret_from_fork+0x13/0x80 There's another hang at 09:03:09 which I can post if you'd find it useful. If the block-attach happened at 09:00:42 and the crash happened at 09:02:13 then I assume that whatever udevd started doing started at 09:01:12 (eg 61 seconds earlier) which means that it's more likely to be the block-detach at the end of the backup causing the hang not the block-attach. The backup runs pretty quick. I'm inclined to turn off the snapshot backup if this isn't a known and resolved problem... Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |