[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] netfront.c: gnttab_query_foreign_access returns nonzero in network_tx_buf_gc




>>> On Thu, May 25, 2006 at 10:37 AM, in message
<04b301c68019$96989e80$0302a8c0@Violet>, "Steven Hand"
<steven.hand@xxxxxxxxxxxx> wrote: 

>> I've been working form the netfront.c in the testing tree and using
SLES
>> 10 RC1 for i386 on a SMP box.  When I stress the network using iperf
in
>> a domU, domU acting as client on a gigabit network, I occasionally
get a
>> panic at the dev_kfree_skb_irq(skb); line.  This is the same panic
as
>> reported in
>> http://lists.xensource.com/archives/html/xen- devel/2006-
05/msg00919.html
>>
>> The trace  indicates that the skb is bad and it looks like the skb
is
>> an id.  Investigating further, the condition occurs if the
>> gnttab_query_foreign_access returns non zero on a second or latter
>> iteration through the for loop.  If it return non zero, the the
code
>> takes the 'goto out' which by passes fixing up  np- >tx.rsp_cons. 
Then
>> the next time in network_tx_buf_gc we reuse  np- >tx.rsp_cons which
is at
>> the location of a previously completed skb and the skb gets an id
and
>> not a skb.
>>
>> Looking at the unstable tree, the goto has been removed and
replaced
>> with a break.  However, it looks like if
gnttab_query_foreign_access
>> returns non zero between np- >tx.rsp_cons and prod, then the
>> np- >tx.rsp_cons = prod; could advance  np- >tx.rsp_cons too far
causing
>> other problems latter (I have not tested this yet though).
> 
> Yes, this definitely looks like a bug; the 'break' in - unstable is
not 
> really much better
> than the 'goto out:' in - testing since in either case we can't
easily 
> correctly recover.
> 
>> The problem I'm having is that I can't find the root cause as to
why
>> gnttab_query_foreign_access returns an 8 (GTF_reading?) and not 0. 
I've
>> looked in netback.c and and xen/common/grant_table.c and am not
seeing
>> it (not that it's not there).
> 
> Well all this means is that netback is still using the grant which
should of 
> 
> course
> be impossible since the ring pointers have been advanced. I.e.
something is 
> borked.
> 
> Can you try this with a debug build of xen? It would be interesting
to see 
> if xen
> complains about any grant refs prior to this occurance...
> 
> 
> cheers,
> 
> S.

Here's the serial output from a debug build of xen.  The domain_crash
does not happen on the non-debug xen.
.
.
.
(XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0
flags=0 (61 of 64)
(XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0
flags=0 (59 of 64)
(XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0
flags=0 (61 of 64)
(XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0
flags=0 (58 of 64)
(XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0
flags=0 (61 of 64)
(XEN) (file=memory.c, line=64) Could not allocate order=0 extent: id=0
flags=0 (60 of 64)
(XEN) DOM0: (file=mm.c, line=2449) PTE entry 0 for address f2c81000
doesn't match frame 7a568
(XEN) DOM0: (file=mm.c, line=637) Attempt to implicitly unmap a granted
PTE 4b2fe861
(XEN) domain_crash called from mm.c:638
(XEN) Domain 0 (vcpu#1) crashed on cpu#1:
(XEN) ----[ Xen-3.0.2_09668-0.1    Not tainted ]----
(XEN) CPU:    1
(XEN) EIP:    0061:[<c0101287>]
(XEN) EFLAGS: 00200212   CONTEXT: guest
(XEN) eax: 00000014   ebx: 00000000   ecx: f4c312c0   edx: 00000001
(XEN) esi: f2881f34   edi: f4c2eb9c   ebp: f364e408   esp: f2881ee4
(XEN) cr0: 80050033   cr3: 79a7d000
(XEN) ds: 007b   es: 007b   fs: 0000   gs: 0033   ss: 0069   cs: 0061
(XEN) Guest stack trace from esp=f2881ee4:
(XEN)    f4c24761 f37a2380 f37a2000 f3f02180 f37a2000 f3f02180 c1658c20
f2881f28
(XEN)    c014a91a 00483f02 c032c900 00000001 00000001 00915700 c0c78380
00000000
(XEN)    00483f01 00000027 0003013e 05ea0020 f4c28c40 f2880000 00000000
c03acd10
(XEN)    c0123a41 00000001 c036e128 f2880000 c03ab180 c0123555 c03ade60
00000007
(XEN)    00000001 f2880000 00000001 fbdf7000 00000020 c0123665 00000013
f2881fbc
(XEN)    c01068cc 00000000 c017b610 00000000 00000000 c024d5b1 00000020
00000000
(XEN)    b7b8d8d9 08315c88 bfdd38ac bfdd3848 c0105138 f2881fbc b7b8d8d9
00800d4a
(XEN)    bfdd3b40 08315c88 bfdd38ac bfdd3848 08315c88 0000007b 0000007b
ffffffec
(XEN)    b7b8bbba 00000073 00200286 bfdd37bc 0000007b 00000008
0000240b
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.

On a non-debug xen I see destroy_grant_host_mapping failing with rc =
0xffffffff, domain 2, ref 0x20, flags 6 in __gnttab_unmap_grant_ref in
xen/common/grant_table.c.  Also in netback.c in net_tx_action_dealloc,
the HYPERVISOR_grant_table_op call succeeds but if you look at the
status of each of the gnttab_unmap_grant_ref_t entries there is one with
0xffffffff.


> 
> 
> _______________________________________________
> Xen- devel mailing list
> Xen- devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen- devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.