[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] Fwd: Re: xennet: skb rides the rocket: 20 slots



This is a forwarded message
From: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
To: ANNIE LI <annie.li@xxxxxxxxxx>
Date: Thursday, January 24, 2013, 9:45:42 AM
Subject: [Xen-devel] xennet: skb rides the rocket: 20 slots


Resend because xen-devel wasn't copied on the original ...

===8<==============Original message text===============

Monday, January 14, 2013, 10:39:53 AM, you wrote:

> Hi

> I created a patch for this, but I failed to reproduce this issue and 
> verify it. The patch was attached,

Hi Annie,

I finally had time to seriously test the patch.
I put in some more warn's and made the bracing a bit more explicit (i hope i 
did the bracing right).

Problem is the current code crashes with:

[ 4189.815911] nf_conntrack: automatic helper assignment is deprecated and it 
will be removed soon. Use the iptables CT target to attach helpers instead.
[29601.932324] xennet:  xennet_xmit_skb err_end: 19 slots MAX_SKB_FRAGS: 17 
div_roundup:1 xennet_count_skb_frag_slots:0 offset:106 skb_headlen:54 
skb->len:64454, skb->data_len:0 skb->truesize:65168 nr_frags:0 page_size:4096 
prot:0800 gso:1 linearize:0 gso_segs:46 dev:eth0 transp:0006
[29601.932426] BUG: unable to handle kernel NULL pointer dereference at         
  (null)
[29601.932461] IP: [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
[29601.932498] PGD 2d497067 PUD 2dd94067 PMD 0
[29601.932526] Oops: 0000 [#1] PREEMPT SMP
[29601.932549] Modules linked in:
[29601.932566] CPU 0
[29601.932581] Pid: 2948, comm: deluged Not tainted 
3.8.0-rc4-20130123-netpatched-rocketscience-radeon-qmax-new-a #1
[29601.932615] RIP: e030:[<ffffffff816551f4>]  [<ffffffff816551f4>] 
xennet_xmit_skb+0x204/0x370
[29601.932650] RSP: e02b:ffff88002cd95698  EFLAGS: 00010207
[29601.932669] RAX: 0000000000000000 RBX: ffff8800049574e8 RCX: 0000000000000036
[29601.932691] RDX: ffff88000398006a RSI: ffff88002ce4b8c0 RDI: 0000000000000000
[29601.932712] RBP: ffff88002cd95758 R08: 0000000000000000 R09: 0000000000000000
[29601.932734] R10: 0000000000000000 R11: 0000000000000002 R12: ffff88002cc98000
[29601.932755] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000036
[29601.932786] FS:  00007fa9ca911700(0000) GS:ffff88002fc00000(0000) 
knlGS:0000000000000000
[29601.932811] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[29601.932830] CR2: 0000000000000000 CR3: 000000002cf0d000 CR4: 0000000000000660
[29601.932853] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[29601.932877] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[29601.932900] Process deluged (pid: 2948, threadinfo ffff88002cd94000, task 
ffff88002ce4b150)
[29601.932923] Stack:
[29601.932934]  ffff880000000036 000000000000fbc6 ffff880000000000 
24e008890000fe90
[29601.932970]  ffff880000000000 0000000000001000 ffff880000000800 
ffff880000000001
[29601.933004]  0000000000000000 ffff88000000002e ffff88002cc98000 
ffff880000000006
[29601.933795] Call Trace:
[29601.933795]  [<ffffffff818570b9>] dev_hard_start_xmit+0x219/0x480
[29601.933795]  [<ffffffff81873856>] sch_direct_xmit+0xf6/0x290
[29601.933795]  [<ffffffff818574c6>] dev_queue_xmit+0x1a6/0x5a0
[29601.933795]  [<ffffffff81857320>] ? dev_hard_start_xmit+0x480/0x480
[29601.933795]  [<ffffffff810af255>] ? trace_softirqs_off+0x85/0x1b0
[29601.933795]  [<ffffffff818f2596>] ip_finish_output+0x226/0x530
[29601.933795]  [<ffffffff818f243d>] ? ip_finish_output+0xcd/0x530
[29601.933795]  [<ffffffff818f28f9>] ip_output+0x59/0xe0
[29601.933795]  [<ffffffff818f1438>] ip_local_out+0x28/0x90
[29601.933795]  [<ffffffff818f19df>] ip_queue_xmit+0x17f/0x490
[29601.933795]  [<ffffffff818f1860>] ? ip_send_unicast_reply+0x330/0x330
[29601.933795]  [<ffffffff810a55f7>] ? getnstimeofday+0x47/0xe0
[29601.933795]  [<ffffffff818471e9>] ? __skb_clone+0x29/0x120
[29601.933795]  [<ffffffff81907c2d>] tcp_transmit_skb+0x3fd/0x8d0
[29601.933795]  [<ffffffff8190ac9a>] tcp_write_xmit+0x22a/0xa80
[29601.933795]  [<ffffffff81137ebe>] ? alloc_pages_current+0xde/0x1c0
[29601.933795]  [<ffffffff8190b51b>] tcp_push_one+0x2b/0x40
[29601.933795]  [<ffffffff818fbdc4>] tcp_sendmsg+0x8d4/0xe10
[29601.933795]  [<ffffffff819225d6>] inet_sendmsg+0xa6/0x100
[29601.933795]  [<ffffffff81922530>] ? inet_autobind+0x60/0x60
[29601.933795]  [<ffffffff8183eb52>] sock_sendmsg+0x82/0xb0
[29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
[29601.933795]  [<ffffffff81118a34>] ? might_fault+0x84/0x90
[29601.933795]  [<ffffffff811189eb>] ? might_fault+0x3b/0x90
[29601.933795]  [<ffffffff8114082b>] ? __kmalloc+0xfb/0x160
[29601.933795]  [<ffffffff8184c9ad>] ? verify_iovec+0x7d/0xf0
[29601.933795]  [<ffffffff8183fb73>] __sys_sendmsg+0x393/0x3a0
[29601.933795]  [<ffffffff819bafc5>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
[29601.933795]  [<ffffffff810b58f8>] ? lock_acquire+0xd8/0x100
[29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
[29601.933795]  [<ffffffff81168777>] ? fget_light+0xd7/0x140
[29601.933795]  [<ffffffff811686da>] ? fget_light+0x3a/0x140
[29601.933795]  [<ffffffff8183fd34>] sys_sendmsg+0x44/0x80
[29601.933795]  [<ffffffff819bbe29>] system_call_fastpath+0x16/0x1b
[29601.933795] Code: e9 ca fe ff ff 49 8b b4 24 80 00 00 00 48 89 df 45 31 ed 
e8 1f 16 20 00 48 3d 00 f0 ff ff 48 89 c7 76 08 e9 17 01 00 00 4c 89 f7 <4c> 8b 
37 4c 89 e6 48 c7 07 00 00 00 00 41 ff c5 e8 b7 f3 ff ff
[29601.933795] RIP  [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
[29601.933795]  RSP <ffff88002cd95698>
[29601.933795] CR2: 0000000000000000
[29602.018741] ---[ end trace 5ec54203e8f81a1b ]---
[29602.018747] Kernel panic - not syncing: Fatal exception in interrupt



Which accoring to addr2line is:

segs = segs->next;

I have attached the resulting patch

--
Sander

> Thanks
> Annie

> On 2013-1-11 18:16, Ian Campbell wrote:
>> On Fri, 2013-01-11 at 10:09 +0000, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> Without GSO I don't think you should be seeing packets larger than the MTU,
>>>> which would normally be either ~1500 or ~9000 and fit easily within any
>>>> sensible negotiation for the max frags. I don't think you should worry 
>>>> unduly
>>>> about this case.
>>>>
>>> A stack could still send down a packet with one byte per frag though,
>>> right? A copy-and-coalesce path would still be needed in this case.
>> True. In that case skb_linearise would probably do the job on Linux.
>>
>> Ian
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
===8<===========End of original message text===========



-- 
Best regards,
 Sander                            mailto:linux@xxxxxxxxxxxxxx
--- Begin Message ---
Monday, January 14, 2013, 10:39:53 AM, you wrote:

> Hi

> I created a patch for this, but I failed to reproduce this issue and 
> verify it. The patch was attached,

Hi Annie,

I finally had time to seriously test the patch.
I put in some more warn's and made the bracing a bit more explicit (i hope i 
did the bracing right).

Problem is the current code crashes with:

[ 4189.815911] nf_conntrack: automatic helper assignment is deprecated and it 
will be removed soon. Use the iptables CT target to attach helpers instead.
[29601.932324] xennet:  xennet_xmit_skb err_end: 19 slots MAX_SKB_FRAGS: 17 
div_roundup:1 xennet_count_skb_frag_slots:0 offset:106 skb_headlen:54 
skb->len:64454, skb->data_len:0 skb->truesize:65168 nr_frags:0 page_size:4096 
prot:0800 gso:1 linearize:0 gso_segs:46 dev:eth0 transp:0006
[29601.932426] BUG: unable to handle kernel NULL pointer dereference at         
  (null)
[29601.932461] IP: [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
[29601.932498] PGD 2d497067 PUD 2dd94067 PMD 0
[29601.932526] Oops: 0000 [#1] PREEMPT SMP
[29601.932549] Modules linked in:
[29601.932566] CPU 0
[29601.932581] Pid: 2948, comm: deluged Not tainted 
3.8.0-rc4-20130123-netpatched-rocketscience-radeon-qmax-new-a #1
[29601.932615] RIP: e030:[<ffffffff816551f4>]  [<ffffffff816551f4>] 
xennet_xmit_skb+0x204/0x370
[29601.932650] RSP: e02b:ffff88002cd95698  EFLAGS: 00010207
[29601.932669] RAX: 0000000000000000 RBX: ffff8800049574e8 RCX: 0000000000000036
[29601.932691] RDX: ffff88000398006a RSI: ffff88002ce4b8c0 RDI: 0000000000000000
[29601.932712] RBP: ffff88002cd95758 R08: 0000000000000000 R09: 0000000000000000
[29601.932734] R10: 0000000000000000 R11: 0000000000000002 R12: ffff88002cc98000
[29601.932755] R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000036
[29601.932786] FS:  00007fa9ca911700(0000) GS:ffff88002fc00000(0000) 
knlGS:0000000000000000
[29601.932811] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[29601.932830] CR2: 0000000000000000 CR3: 000000002cf0d000 CR4: 0000000000000660
[29601.932853] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[29601.932877] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[29601.932900] Process deluged (pid: 2948, threadinfo ffff88002cd94000, task 
ffff88002ce4b150)
[29601.932923] Stack:
[29601.932934]  ffff880000000036 000000000000fbc6 ffff880000000000 
24e008890000fe90
[29601.932970]  ffff880000000000 0000000000001000 ffff880000000800 
ffff880000000001
[29601.933004]  0000000000000000 ffff88000000002e ffff88002cc98000 
ffff880000000006
[29601.933795] Call Trace:
[29601.933795]  [<ffffffff818570b9>] dev_hard_start_xmit+0x219/0x480
[29601.933795]  [<ffffffff81873856>] sch_direct_xmit+0xf6/0x290
[29601.933795]  [<ffffffff818574c6>] dev_queue_xmit+0x1a6/0x5a0
[29601.933795]  [<ffffffff81857320>] ? dev_hard_start_xmit+0x480/0x480
[29601.933795]  [<ffffffff810af255>] ? trace_softirqs_off+0x85/0x1b0
[29601.933795]  [<ffffffff818f2596>] ip_finish_output+0x226/0x530
[29601.933795]  [<ffffffff818f243d>] ? ip_finish_output+0xcd/0x530
[29601.933795]  [<ffffffff818f28f9>] ip_output+0x59/0xe0
[29601.933795]  [<ffffffff818f1438>] ip_local_out+0x28/0x90
[29601.933795]  [<ffffffff818f19df>] ip_queue_xmit+0x17f/0x490
[29601.933795]  [<ffffffff818f1860>] ? ip_send_unicast_reply+0x330/0x330
[29601.933795]  [<ffffffff810a55f7>] ? getnstimeofday+0x47/0xe0
[29601.933795]  [<ffffffff818471e9>] ? __skb_clone+0x29/0x120
[29601.933795]  [<ffffffff81907c2d>] tcp_transmit_skb+0x3fd/0x8d0
[29601.933795]  [<ffffffff8190ac9a>] tcp_write_xmit+0x22a/0xa80
[29601.933795]  [<ffffffff81137ebe>] ? alloc_pages_current+0xde/0x1c0
[29601.933795]  [<ffffffff8190b51b>] tcp_push_one+0x2b/0x40
[29601.933795]  [<ffffffff818fbdc4>] tcp_sendmsg+0x8d4/0xe10
[29601.933795]  [<ffffffff819225d6>] inet_sendmsg+0xa6/0x100
[29601.933795]  [<ffffffff81922530>] ? inet_autobind+0x60/0x60
[29601.933795]  [<ffffffff8183eb52>] sock_sendmsg+0x82/0xb0
[29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
[29601.933795]  [<ffffffff81118a34>] ? might_fault+0x84/0x90
[29601.933795]  [<ffffffff811189eb>] ? might_fault+0x3b/0x90
[29601.933795]  [<ffffffff8114082b>] ? __kmalloc+0xfb/0x160
[29601.933795]  [<ffffffff8184c9ad>] ? verify_iovec+0x7d/0xf0
[29601.933795]  [<ffffffff8183fb73>] __sys_sendmsg+0x393/0x3a0
[29601.933795]  [<ffffffff819bafc5>] ? _raw_spin_unlock_irqrestore+0x75/0xa0
[29601.933795]  [<ffffffff810b58f8>] ? lock_acquire+0xd8/0x100
[29601.933795]  [<ffffffff810b5d87>] ? lock_release+0x117/0x250
[29601.933795]  [<ffffffff81168777>] ? fget_light+0xd7/0x140
[29601.933795]  [<ffffffff811686da>] ? fget_light+0x3a/0x140
[29601.933795]  [<ffffffff8183fd34>] sys_sendmsg+0x44/0x80
[29601.933795]  [<ffffffff819bbe29>] system_call_fastpath+0x16/0x1b
[29601.933795] Code: e9 ca fe ff ff 49 8b b4 24 80 00 00 00 48 89 df 45 31 ed 
e8 1f 16 20 00 48 3d 00 f0 ff ff 48 89 c7 76 08 e9 17 01 00 00 4c 89 f7 <4c> 8b 
37 4c 89 e6 48 c7 07 00 00 00 00 41 ff c5 e8 b7 f3 ff ff
[29601.933795] RIP  [<ffffffff816551f4>] xennet_xmit_skb+0x204/0x370
[29601.933795]  RSP <ffff88002cd95698>
[29601.933795] CR2: 0000000000000000
[29602.018741] ---[ end trace 5ec54203e8f81a1b ]---
[29602.018747] Kernel panic - not syncing: Fatal exception in interrupt



Which accoring to addr2line is:

segs = segs->next;

I have attached the resulting patch

--
Sander

> Thanks
> Annie

> On 2013-1-11 18:16, Ian Campbell wrote:
>> On Fri, 2013-01-11 at 10:09 +0000, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> Without GSO I don't think you should be seeing packets larger than the MTU,
>>>> which would normally be either ~1500 or ~9000 and fit easily within any
>>>> sensible negotiation for the max frags. I don't think you should worry 
>>>> unduly
>>>> about this case.
>>>>
>>> A stack could still send down a packet with one byte per frag though,
>>> right? A copy-and-coalesce path would still be needed in this case.
>> True. In that case skb_linearise would probably do the job on Linux.
>>
>> Ian
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel

Attachment: xen-netfront.diff
Description: Binary data


--- End Message ---
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.