[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen-users] kernel 3.9.2 - xen 4.2.2/4.3rc1 => BUG unable to handle kernel paging request netif_poll+0x49c/0xe8



On Fri, Jul 05, 2013 at 12:40:02PM +0200, Dion Kant wrote:
> > ======
> > diff --git a/drivers/xen/netfront/netfront.c 
> > b/drivers/xen/netfront/netfront.c
> > index 6e5d233..9583011 100644
> > --- a/drivers/xen/netfront/netfront.c
> > +++ b/drivers/xen/netfront/netfront.c
> > @@ -1306,6 +1306,7 @@ static RING_IDX xennet_fill_frags(struct 
> > netfront_info *np,
> >         struct sk_buff *nskb;
> >
> >         while ((nskb = __skb_dequeue(list))) {
> > +               BUG_ON(nr_frags >= MAX_SKB_FRAGS);
> >                 struct netif_rx_response *rx =
> >                         RING_GET_RESPONSE(&np->rx, ++cons);
> >
> 
> Integrated the patch. I obtained a crash dump and the log in it did not
> show this BUG_ON. Here is the relevant section from the log
> 
> var/lib/xen/dump/domUA # crash /root/vmlinux-p1
> 2013-0705-1347.43-domUA.1.core
> 
> [    7.670132] Adding 4192252k swap on /dev/xvda1.  Priority:-1
> extents:1 across:4192252k SS
> [   10.204340] NET: Registered protocol family 17
> [  481.534979] netfront: Too many frags
> [  487.543946] netfront: Too many frags
> [  491.049458] netfront: Too many frags
> [  491.491153] ------------[ cut here ]------------
> [  491.491628] kernel BUG at drivers/xen/netfront/netfront.c:1295!

So what's the code around line 1295? There must be a BUG_ON there. It's
normal you didn't see the same line number in my patch (1306) because we
were using different kernel.

> [  491.492056] invalid opcode: 0000 [#1] SMP
> [  491.492056] Modules linked in: af_packet autofs4 xennet xenblk cdrom
> [  491.492056] CPU 0
> [  491.492056] Pid: 1471, comm: sshd Not tainted 3.7.10-1.16-dbg-p1-xen #8 
> [  491.492056] RIP: e030:[<ffffffffa0023aef>]  [<ffffffffa0023aef>]
> netif_poll+0xe4f/0xf90 [xennet]
> [  491.492056] RSP: e02b:ffff8801f5803c60  EFLAGS: 00010202
> [  491.492056] RAX: ffff8801f5803da0 RBX: ffff8801f1a082c0 RCX:
> 0000000180200010
> [  491.492056] RDX: ffff8801f5803da0 RSI: ffff8801fe83ec80 RDI:
> ffff8801f03b2900
> [  491.492056] RBP: ffff8801f5803e20 R08: 0000000000000001 R09:
> 0000000000000000
> [  491.492056] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff8801f03b3400
> [  491.492056] R13: 0000000000000011 R14: 000000000004327e R15:
> ffff8801f06009c0
> [  491.492056] FS:  00007fc519f3d7c0(0000) GS:ffff8801f5800000(0000)
> knlGS:0000000000000000
> [  491.492056] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  491.492056] CR2: 00007fc51410c400 CR3: 00000001f1430000 CR4:
> 0000000000002660
> [  491.492056] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  491.492056] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [  491.492056] Process sshd (pid: 1471, threadinfo ffff8801f1264000,
> task ffff8801f137bf00)
> [  491.492056] Stack:
> [  491.492056]  ffff8801f5803d60 ffffffff8008503e ffff8801f0600a40
> ffff8801f0600000
> [  491.492056]  0004328000000040 0000001200000000 ffff8801f5810570
> ffff8801f0600a78
> [  491.492056]  0000000000000000 ffff8801f0601fb0 0004326e00000012
> ffff8801f5803d00
> [  491.492056] Call Trace:
> [  491.492056]  [<ffffffff8041ee35>] net_rx_action+0xd5/0x250
> [  491.492056]  [<ffffffff800376d8>] __do_softirq+0xe8/0x230
> [  491.492056]  [<ffffffff8051151c>] call_softirq+0x1c/0x30
> [  491.492056]  [<ffffffff80008a75>] do_softirq+0x75/0xd0
> [  491.492056]  [<ffffffff800379f5>] irq_exit+0xb5/0xc0
> [  491.492056]  [<ffffffff8036c225>] evtchn_do_upcall+0x295/0x2d0
> [  491.492056]  [<ffffffff8051114e>] do_hypervisor_callback+0x1e/0x30
> [  491.492056]  [<00007fc519f97700>] 0x7fc519f976ff
> [  491.492056] Code: ff 0f 1f 00 e8 a3 c1 40 e0 85 c0 90 75 69 44 89 ea
> 4c 89 f6 4c 89 ff e8 f0 cb ff ff c7 85 80 fe ff ff ea ff ff ff e9 7c f4
> ff ff <0f> 0b ba 12 00 00 00 48 01 d0 48 39 c1 0f 82 bd fc ff ff e9 e9
> [  491.492056] RIP  [<ffffffffa0023aef>] netif_poll+0xe4f/0xf90 [xennet]
> [  491.492056]  RSP <ffff8801f5803c60>
> [  491.511975] ---[ end trace c9e37475f12e1aaf ]---
> [  491.512877] Kernel panic - not syncing: Fatal exception in interrupt
> 
> In the mean time Jan took the bug in bugzilla
> (https://bugzilla.novell.com/show_bug.cgi?id=826374) and created a first
> patch. I propose we continue the discussion there and post the
> conclusion in this list to finish this thread here as well.
> 

I'm not so sure what's the best way. In general it would be much a
burden for developers to look at every bugzilla, let alone register
account for each bugzilla.

Anyway, as we already started the discussion here, I think we should
just continue it here. We could do this the other way around, discuss
here and post conclusion / references there.


Wei.

> 
> Dion

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.