[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles "bisected"



On Wed, Mar 12, 2014 at 12:45:16PM +0100, Sander Eikelenboom wrote:
[...]
> >> 
> >> - Sometimes (but not always) netfront also spits out:
> >>   dev_warn(dev, "Invalid extra type: %d\n", extra->type);
> >>   where the extra type seems a random value (seen 196, 31 ..)
> >> - Sometimes (but not always) netfront also spits out:
> >>   dev_warn(dev, "Need more slots\n");
> >> - Sometimes (but not always) netfront also spits out:
> >>   dev_warn(dev, "Missing extra info\n");
> >> 
> 
> > That's because garbage is pushed to frontend. It's trying to decode some
> > random values.
> 
> >> 
> >> First question that comes to my mind:
> >> - Are the warnings netfront spits out the cause of xen reporting the bad 
> >> grant reference ?
> >>   Or
> >>   Are the Bad grant references Xen is reporting .. causing netfront to 
> >> spit out the warnings ?
> 
> > No. They are not directly connected.
> 
> > 0. backend breaks down a skb into slots.
> > 1. backend gets a request from the rx ring (a slot with gref provided by
> >    frontend), until all slots in skb are handled.
> > 2. backend grant-copies data to that slot (with gref).
> > 3. backend pushes a response to frontend, whose content depends on the
> >    result of previous step.
> > 4. frontend handles that response.
> 
> > So the grant copy error in hypervisor you're seeing is in 2. The
> > complaint from frontend is in 4, when backend pushes a response with
> > error status in it. The bug is most likely in 0, when the calculation
> > goes wrong - either in the breakdown process, or in the macro that
> > returns usable slots (this can lead to backend thinks there's enough
> > slots while there's not).
> 
> Ok so perhaps a debug patch which does more sanity checking there ?
> Because i don't see any errors/warnings from netback in dom0.
> 

From the look of the code, 0 looks correct to me. Netback won't complain
about "bad gref" because it has no idea what a "good gref" looks like.
Only Xen has the knowledge whether a gref is legit. All netback sees is
the hypercall fails and it tries to push corresponding response to
netfront. But you can probably safely assume a gref larger than several
thousands be bad.

> >> - Why is that "if (rx->flags & XEN_NETRXF_extra_info) {}" part in 
> >> xen-netfront.c doing there and changing cons midway ?
> >>   The commit message from f942dc2552b8bfdee607be867b12a8971bb9cd85 that 
> >> introduced the if says:
> >> 
> >>     One major change from xen.git is that the guest transmit path (i.e. 
> >> what
> >>     looks like receive to netback) has been significantly reworked to 
> >> remove
> >>     the dependency on the out of tree PageForeign page flag (a core kernel
> >>     patch which enables a per page destructor callback on the final
> >>     put_page). This page flag was used in order to implement a grant map
> >>     based transmit path (where guest pages are mapped directly into SKB
> >>     frags). Instead this version of netback uses grant copy operations into
> >>     regular memory belonging to the backend domain. Reinstating the grant
> >>     map functionality is something which I would like to revisit in the
> >>     future.
> >> 
> >>   It *is* using grant copy now .. so should this part have been removed ?
> >>   And
> >>   Could Paul's commit that seems to be the first to trigger these events 
> >> affect this ?
> >> 
> 
> > This depicts guest *transmit* path, but the issue you're seeing is in
> > guest *receive* path. So that's totally different thing.
> 
> Errr yes well if the (rx->flags & XEN_NETRXF_extra_info) {}) should only be 
> doing something on the transmit path ..
> why does it seem to result to true on my issue ?  (see cons_changed=1 in the 
> debub output.. and you see cons being cons + 1 after that)

First thing is the response goes wild and probably contains random value
so it's possible the flag is "set".

For a legit response, if you see that flag it means there's an "extra
info" slot in the response. That code snippet is to extract that extra
info slot (could be several in a row, that's why there's a loop in
xennet_get_extras). The request / response format is documented in
include/xen/interface/io/netif.h -- it's only the tx one, but rx one is
more or less the same.

Wei.

> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.