[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] kernel crash in xen domain


  • To: Csillag Kristof <csillag.kristof@xxxxxxxxx>
  • From: Bruce Edge <bruce.edge@xxxxxxxxx>
  • Date: Wed, 22 Sep 2010 10:37:10 -0700
  • Cc: xen-users@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Wed, 22 Sep 2010 10:38:41 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=lUvQZ8H9477IcFsS2u+2LPhhpSIHxbVDYHaaAh7zMrkfB2sUQ8A4JgnGntGEGSv/Wn ov5f5XESdh/VwOHUOMpNzGugtuur0ARlOZBkNHwDqRCFvaZcQf1Q1axkRF2phyDWDO7Q msIgz5FbYtdV2hg6e34/W2TdCnqrK+6G2FOAA=
  • List-id: Xen user discussion <xen-users.lists.xensource.com>

2010/7/6 Csillag Kristof <csillag.kristof@xxxxxxxxx>
Hi all,

I had a kernel crash on a XEN domU right now, on a server running since
4 days.

(Current uprecord is 248 days, so this is not excepted, but then again,
that was before
I have upgraded from (XEN 3.2 / kernel 2.6.26) to (XEN 4.0 / kernel 2.6.32))

  * * *

I run Xen hypervisor version 4.0.0 (Debian 4.0.0-2), and
linux kernel 2.6.32-5-xen-amd64 (debian: 2.6.32-15) on both the Dom0 and
the (PV) DomU.

(The DomU is running a XEN kernel because I have a PCI NIC passed to it,
and current debian 2.4.32 pv_ops kernel does not contain the required
pcifront driver.)

Here is what the DomU kernel has said, copied from the output of "xm
console":

------------------

[403163.914167] ------------[ cut here ]------------
[403163.914186] kernel BUG at
/build/buildd-linux-2.6_2.6.32-15-i386-fb7Hfg/linux-2.6-2.6.32/debian/build/source_i386_xen/mm/slub.c:2969!
[403163.914205] invalid opcode: 0000 [#1] SMP
[403163.914222] last sysfs file: /sys/devices/virtual/net/ppp0/uevent
[403163.914236] Modules linked in: tun xt_limit nf_nat_irc nf_nat_ftp
ipt_LOG ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc
nf_conntrack_ftp xt_state xt_TCPMSS xt_tcpmss xt_tcpudp pppoe pppox
ppp_generic slhc sundance iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables
x_tables dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev
snd_pcsp snd_pcm snd_timer snd xen_netfront soundcore snd_page_alloc
ext3 jbd mbcache thermal_sys xen_blkfront mii [last unloaded: sundance]
[403163.914455]
[403163.914465] Pid: 0, comm: swapper Not tainted (2.6.32-5-xen-686 #1)
[403163.914478] EIP: 0061:[<c10b73ec>] EFLAGS: 00010246 CPU: 0
[403163.914492] EIP is at kfree+0x69/0xde
[403163.914502] EAX: 40000000 EBX: c1c56a80 ECX: c145942c EDX: c1575c40
[403163.914514] ESI: c2262000 EDI: c11f09eb EBP: c138f9c8 ESP: c1381ed4
[403163.914527]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[403163.914539] Process swapper (pid: 0, ti=c1380000 task=c13c0ba0
task.ti=c1380000)
[403163.914552] Stack:
[403163.914560]  c1575c40 c8829f1d c13bf520 c13bf520 c1c56a80 c163a654
00000000 c138f9c8
[403163.914597] <0> c11f09eb 00000000 c11f5825 c13bf520 c1380000
00000002 00000008 c138f9c8
[403163.914636] <0> c103d004 c1457408 00000001 0000000a 00000000
00000100 c1380000 00000000
[403163.914680] Call Trace:
[403163.914700]  [<c8829f1d>] ? xennet_interrupt+0x4d/0x57 [xen_netfront]
[403163.914717]  [<c11f09eb>] ? __kfree_skb+0xf/0x6e
[403163.914732]  [<c11f5825>] ? net_tx_action+0x58/0xf9
[403163.914748]  [<c103d004>] ? __do_softirq+0xaa/0x151
[403163.914762]  [<c103d0dc>] ? do_softirq+0x31/0x3c
[403163.914776]  [<c103d1b2>] ? irq_exit+0x26/0x58
[403163.914791]  [<c1199636>] ? xen_evtchn_do_upcall+0x22/0x2c
[403163.914816]  [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[403163.914832]  [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[403163.914847]  [<c1006169>] ? xen_safe_halt+0xf/0x1b
[403163.914861]  [<c10042bf>] ? xen_idle+0x23/0x30
[403163.914875]  [<c1008168>] ? cpu_idle+0x89/0xa5
[403163.914890]  [<c13f980d>] ? start_kernel+0x318/0x31d
[403163.914905]  [<c13fb3c3>] ? xen_start_kernel+0x615/0x61c
[403163.914920]  [<c1409045>] ? efi_init+0xb4/0x580
[403163.914930] Code: 86 00 00 00 40 c1 e8 0c c1 e0 05 01 d0 89 04 24 66
83 38 00 79 06 8b 40 0c 89 04 24 8b 14 24 8b 02 84 c0 78 19 66 a9 00 c0
75 04 <0f> 0b eb fe 8b 04 24 83 c4 10 5b 5e 5f 5d e9 c7 e9 fd ff 8b 04
[403163.915175] EIP: [<c10b73ec>] kfree+0x69/0xde SS:ESP 0069:c1381ed4
[403163.915201] ---[ end trace c5944bb691c7520c ]---
[403163.915212] Kernel panic - not syncing: Fatal exception in interrupt
[403163.915225] Pid: 0, comm: swapper Tainted: G      D
2.6.32-5-xen-686 #1
[403163.915237] Call Trace:
[403163.915247]  [<c128c4e1>] ? panic+0x38/0xe4
[403163.915261]  [<c100bf56>] ? oops_end+0x91/0x9d
[403163.915275]  [<c100a0d3>] ? do_invalid_op+0x0/0x75
[403163.915288]  [<c100a13f>] ? do_invalid_op+0x6c/0x75
[403163.915301]  [<c10b73ec>] ? kfree+0x69/0xde
[403163.915315]  [<c12075a5>] ? sch_direct_xmit+0x69/0x10c
[403163.915329]  [<c11f8095>] ? dev_queue_xmit+0x260/0x38e
[403163.915343]  [<c103d1fa>] ? _local_bh_enable_ip+0x16/0x6e
[403163.915357]  [<c11f8191>] ? dev_queue_xmit+0x35c/0x38e
[403163.915371]  [<c1021d2e>] ? pvclock_clocksource_read+0x48/0xa7
[403163.915387]  [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[403163.915401]  [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[403163.915416]  [<c128e1d3>] ? error_code+0x73/0x78
[403163.915429]  [<c11f09eb>] ? __kfree_skb+0xf/0x6e
[403163.915442]  [<c10b73ec>] ? kfree+0x69/0xde
[403163.915461]  [<c8829f1d>] ? xennet_interrupt+0x4d/0x57 [xen_netfront]
[403163.915476]  [<c11f09eb>] ? __kfree_skb+0xf/0x6e
[403163.915489]  [<c11f5825>] ? net_tx_action+0x58/0xf9
[403163.915503]  [<c103d004>] ? __do_softirq+0xaa/0x151
[403163.915517]  [<c103d0dc>] ? do_softirq+0x31/0x3c
[403163.915530]  [<c103d1b2>] ? irq_exit+0x26/0x58
[403163.915543]  [<c1199636>] ? xen_evtchn_do_upcall+0x22/0x2c
[403163.915556]  [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[403163.915570]  [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[403163.915584]  [<c1006169>] ? xen_safe_halt+0xf/0x1b
[403163.915597]  [<c10042bf>] ? xen_idle+0x23/0x30
[403163.915609]  [<c1008168>] ? cpu_idle+0x89/0xa5
[403163.915623]  [<c13f980d>] ? start_kernel+0x318/0x31d
[403163.915637]  [<c13fb3c3>] ? xen_start_kernel+0x615/0x61c
[403163.915650]  [<c1409045>] ? efi_init+0xb4/0x580

------------------

Meanwhile, the Dom0 kernel has said this:

-----------------
[407187.550176] irq 17: nobody cared (try booting with the "irqpoll" option)
[407187.550217] Pid: 1940, comm: xend Tainted: G        W
2.6.32-5-xen-amd64 #1
[407187.550253] Call Trace:
[407187.550279]  <IRQ>  [<ffffffff810972dd>] ? __report_bad_irq+0x30/0x7d
[407187.550324]  [<ffffffff8109742f>] ? note_interrupt+0x105/0x16e
[407187.550359]  [<ffffffff81097b36>] ? handle_level_irq+0x80/0xc3
[407187.550394]  [<ffffffff811f1a58>] ? __xen_evtchn_do_upcall+0xe1/0x167
[407187.550430]  [<ffffffff811f22e5>] ? xen_evtchn_do_upcall+0x2e/0x42
[407187.550430]  [<ffffffff81012cfe>] ? xen_do_hypervisor_callback+0x1e/0x30
[407187.550430]  <EOI>
[407187.550430] handlers:
[407187.550430] [<ffffffffa00c484e>] (piix_interrupt+0x0/0x192 [ata_piix])
[407187.550430] [<ffffffffa00c484e>] (piix_interrupt+0x0/0x192 [ata_piix])
[407187.550430] Disabling IRQ #17

-----------------

The mentioned IRQ #17 belongs the the passed-through PCI nic.
(I am not using IOMMU, since my MB does not support it.)

I have rebooted the Dom0 (using xm reset), but the passed through NIC
never worked again,
so eventually I had to reboot the whole physical machine.

  * * *

Any idea what could cause this?

Thank you for your help:

   Kristof Csillag


At the risk of just claiming "me too", I would like to second this report of many such errors:

irq 124: nobody cared (try booting with the "irqpoll" option)

It appears that any card that generates a high level of MSI interrupts causes this message. In our case it's a tachyon FC card.

It seems specific to pv-ops kernels as we did not have this problem with the hvm kernel.
 
-Bruce


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.