Xen project Mailing List

Re: [Xen-users] Nasty kernel panic

From: Steven Timm <timm@xxxxxxxx>

Date: Fri, 29 Aug 2008 09:32:38 -0500 (CDT)

Delivery-date: Fri, 29 Aug 2008 07:33:13 -0700

List-id: Xen user discussion <xen-users.lists.xensource.com>

A couple people have pointed at the e1000 driver as a possible culprit and given good reasons why that should be the case..my only question is why did I also get the same kernel panic on the new poweredge 2950 which doesn't have intel e1000 but broadcomm drivers and nics? By the way, all the systems in question have now been up for 18 hours and functioning fine so once we got first the rsyncing done, and then the squid servers all re-initialized correctly, we have been OK since then. I am away from the office but I will follow up the thread and post the kernel config later. Steve Timm ------------------------------------------------------------------ Steven C. Timm, Ph.D (630) 840-8525 timm@xxxxxxxx http://home.fnal.gov/~timm/ Fermilab Computing Division, Scientific Computing Facilities, Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader. On Fri, 29 Aug 2008, Tim Post wrote:

Hi Steve,

On Thu, 2008-08-28 at 16:52 -0500, Steven Timm wrote:

I have seen the following kernel panic 5 times today on
three different machines, two of which had been stable
for months and one of which is a brand new install.


[snip]

<Aug/28 12:21 pm> [<ffffffff88107a79>]
:e1000:e1000_clean_rx_irq+0x430/0x4d5
<Aug/28 12:21 pm> [<ffffffff881074ec>] :e1000:e1000_clean+0x82/0x160
<Aug/28 12:21 pm> [<ffffffff80395f51>] net_rx_action+0xe7/0x254
<Aug/28 12:21 pm> [<ffffffff80233d97>] __do_softirq+0x7b/0x10d
<Aug/28 12:21 pm> [<ffffffff8020b094>] call_softirq+0x1c/0x28
<Aug/28 12:21 pm> [<ffffffff8020cdfd>] do_softirq+0x62/0xd9
<Aug/28 12:21 pm> [<ffffffff8020cc9c>] do_IRQ+0x68/0x71
<Aug/28 12:21 pm> [<ffffffff8034b347>] evtchn_do_upcall+0xee/0x165
<Aug/28 12:21 pm> [<ffffffff8020abca>] do_hypervisor_callback+0x1e/0x2c
<Aug/28 12:21 pm> <EOI>

<Aug/28 12:21 pm>Code: 41 8b 85 f4 00 00 00 4d 85 ed 4d 89 ec 89 44 24 0c
0f 84
36
<Aug/28 12:21 pm>RIP  [<ffffffff88256375>] :ipv6:rt6_select+0x38/0x1f4
<Aug/28 12:21 pm> RSP <ffffffff80526b00>
<Aug/28 12:21 pm>CR2: 00000000000000f4
<Aug/28 12:21 pm> <0>Kernel panic - not syncing: Aiee, killing interrupt
handler


It looks like e1000 might be being spit out. From what I gather in your
message, the only thing that changed was you are now putting a much
higher I/O demand on the drives (rsyncing everything), by extension this
increases the demand on the NIC.

If the e1000 nic is the one enslaved to the bridge, it could be clean up
that's making it freak when a guest stops. If its ejected uncleanly, the
PID next in line with pending i/o for the device will likely be
identified as the culprit.

I had a very similar problem with a buggy Areca driver on dom-0 a couple
of years ago.

Can you post a link to your kernel's .config, or perhaps try the latest
stable version of that module from:

http://sourceforge.net/project/showfiles.php?group_id=42302

As for ipv6, if its being set up you'll see it in /etc/sysconfig
or /etc/network (depending on the distro) pretty clearly. However, that
shouldn't make a difference .. it should work either way.

Hope this helps :)


Cheers!
--Tim

--
Monkey + Typewriter = Echoreply ( http://echoreply.us )

_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.