[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] RING_HAS_UNCONSUMED_REQUESTS oddness



On Wed, 2014-03-12 at 17:14 +0000, Zoltan Kiss wrote:
> On 12/03/14 15:37, Ian Campbell wrote:
> > On Wed, 2014-03-12 at 15:14 +0000, Zoltan Kiss wrote:
> >> On 12/03/14 14:30, Ian Campbell wrote:
> >>> On Wed, 2014-03-12 at 14:27 +0000, Zoltan Kiss wrote:
> >>>> On 12/03/14 10:28, Ian Campbell wrote:
> >>>>> On Tue, 2014-03-11 at 23:24 +0000, Zoltan Kiss wrote:
> >>>>>> On 11/03/14 15:44, Ian Campbell wrote:
> >>>>>
> >>>>>>> Is it the case that this macro considers a request to be unconsumed if
> >>>>>>> the *response* to a request is outstanding as well as if the request
> >>>>>>> itself is still on the ring?
> >>>>>> I don't think that would make sense. I think everywhere where this 
> >>>>>> macro
> >>>>>> is called the caller is not interested in pending request (pending 
> >>>>>> means
> >>>>>> consumed but not responded)
> >>>>>
> >>>>> It might be interested in such pending requests in some of the
> >>>>> pathological cases I allude to in the next paragraph though?
> >>>>>
> >>>>> For example if the ring has unconsumed requests but there are no slots
> >>>>> free for a response, it would be better to treat it as no unconsumed
> >>>>> requests until space opens up for a response, otherwise something else
> >>>>> just has to abort the processing of the request when it notices the lack
> >>>>> of space.
> >>>>>
> >>>>> (I'm totally speculating here BTW, I don't have any concrete idea why
> >>>>> things are done this way...)
> >>>>>
> >>>>>
> >>>>>>> I wonder if this apparently weird construction is due to pathological
> >>>>>>> cases when one or the other end is not picking up requests/responses?
> >>>>>>> i.e. trying to avoid deadlocking the ring or generating an interrupt
> >>>>>>> storm when the ring it is full of one or the other or something along
> >>>>>>> those lines?
> >>>>>
> >>>>>
> >>>>
> >>>> Also, let me quote again my example about when rsp makes sense:
> >>>>
> >>>> "To clarify what does this do, let me show an example:
> >>>> req_prod = 253
> >>>> req_cons = 256
> >>>> rsp_prod_pvt = 0
> >>>
> >>> I think to make sense of this I need to see the sequence of reads/writes
> >>> from both parties in a sensible ordering which would result in reads
> >>> showing the above. i.e. a demonstration of the race not just an
> >>> assertion that if the values are read as is things makes sense.
> >>
> >> Let me extend it:
> >>
> >> - callback reads req_prod = 253
> >
> > callback == backend? Which context is this code running in? Which part
> > of the system is the callback logically part of?
> Yes, it is part of the backend, the function which handles when we can 
> release a slot back. With grant copy we don't have such thing, but with 
> mapping xenvif_zerocopy_callback does this (or in classic kernel, it had 
> a different name, but we called it page destructor). It can run from any 
> context, it depends on who calls kfree_skb.

I think this is the root of the problem. The pv protocols really assume
one entity on either end is moving/updating the ring pointers. If you
have two entities on one end both doing this then you need to make sure
you have appropriate locking in place.

In the classic kernel wasn't the dealloc ring actually processed from
the tx/rx action -- i.e. it was forced back into a single context at the
point where the ring was actually touched.

Aren't you batching things up in a similar way? Perhaps you just need to
fix where you are draining those batches to be properly locked against
other updaters of the ring?

> >> - req is UINT_MAX-3 therefore, but actually there isn't any request to
> >> consume, it should be 0
> >
> > Only if something is ignoring the fact that it has seen req_prod == 256.
> >
> > If callback is some separate entity to backend within dom0 then what you
> > have here is an internal inconsistency in dom0 AFAICT. IOW it seems like
> > you are missing some synchronisation and/or have two different entities
> > acting as backend.
> The callback only needs to know whether it should poke the NAPI instance 
> or not. There is this special case, if there are still a few unconsumed 
> request, but the ring is nearly full of pending requests and 
> xenvif_tx_pending_slots_available says NAPI should bail out, we have to 
> schedule it back once we have enough free pending slots again.
> As I said in an another mail of this thread, this poking happens in the 
> callback, but actually it should be moved to the dealloc thread.

Right, I think that's what I was trying to say above. I missed that
other mail I'm afraid (or didn't grok it).

>  However 
> thinking further, this whole xenvif_tx_pending_slots_available stuff 
> seems to be unnecessary to me:
> It supposed to check if we have enough slot in the pending ring for the 
> maximum number of possible slots, otherwise the backend bails out. It 
> does so because if the backend start to consume the requests from the 
> shared ring but runs out free slots in the pending ring, we are in 
> trouble. But the pending ring supposed to have the same amount of slots 
> as the shared one. And a consumed but not responded slot from the shared 
> ring means a used slot in the pending ring. Therefore the frontend won't 
> be able to push more than (MAX_PENDING_REQS - nr_pending_reqs(vif)) 
> requests to the ring anyway. At least in practice, as MAX_PENDING_REQS = 
> RING_SIZE(...). If we could bind the two to each other directly, we can 
> get rid of this unnecessary checking, and whoever release the used 
> pending slots should not poke the NAPI instance, because the frontend 
> will call an interrupt if it sends a new packet anyway.

The frontend tries to do a certain amount of event elision, using the
event pointer etc, so you'd need to be careful since getting that wrong
will either stall or result in more interrupt than necessary, which has
a back impact on batching. But perhaps it could work.

In any case, it seems like doing the poke from the callback is wrong and
we should revert the patches which DaveM already applied and revisit
this aspect of things, do you agree?

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.