[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 6/6] Under conditions of high load, it was possible for xenvif to queue up very large numbers of packets before pushing them all upstream in one go. This was partially ameliorated by use of the NDIS_RECEIVE_FLAGS_RESOURCES flag.



On 23/07/2021 15:08, Martin Harvey wrote:

-----Original Message-----
From: win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of 
Paul Durrant
Sent: 22 July 2021 13:44
To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx
Subject: Re: [PATCH 6/6] Under conditions of high load, it was possible for 
xenvif to queue up very large numbers of packets before pushing them all 
upstream in one go. This was partially ameliorated by use of the 
NDIS_RECEIVE_FLAGS_RESOURCES flag.

I think you first ought to make a new version of
VifReceiverQueuePacket() that can return a boolean value to say whether you should 
queue more or not. Then XENNET can be the one > to apply the backpressure (as 
it should be).

Ah. Okay,  think this is possibly due to a misnomer. I need to explain why I don't think 
that will work: As I understand it, the problem is that there aren't actually any 
explicit flow control signals from above. Perhaps I should have called it 
"Throttling" instead of backpressure, and at the moment, that throttling is 
internal to XenVif, because there's no way of knowing exactly how much stress/resource 
scarcity higher levels of the stack are under.


I guess the issue is I don't see why it is a problem for xenvif to accumulate a large quantity of data before pushing. As for flow control signals, yes, all we essentially have is the discrepancy between buffers pushed up to XENNET and those returned. This should be capped at 2048 because XENNET asserts 'low resources' at that point.

  The way it's currently implemented is:

- From Ring->PacketComplete, we allow RX_BUFFERING_MAX to be pushed upsteam per 
DPC.

Why do we want to limit here? Is the concern that we end up pushing so much we tip XENNET into the low resource condition? If so then that's exactly where a return value from VifReceiverQueuePacket() could help. XENNET essentially says 'stop giving me packets, I'm waiting for buffers back from the stack'.

- If that fails to empty Ring->PacketComplete, we immediately schedule another 
DPC, this allowing other DPC's to run first, which is assumed to give higher 
layers a chance to digest the packets pushed up, and/or free up resources when 
those packets have been completed.

Why is it assumed higher layers process in that fashion; they may well process in context of NdisMIndicateReceiveNetBufferLists(). More efficiency could potentially be had there by indicating more than one at once. (We used to do that but had to stop due to a bug in Server 2008 IIRC... so it may now be possible again).

- Upstream of that, we are holding Ring->PacketQueue, which gets cleared and moved 
to Ring->PacketComplete via ReceiverRingProcessPacket.
- This transfer of Ring->PacketQueue to Ring->PacketComplete will only happen when 
Ring->PacketComplete has been emptied, that subject to RX_BUFFERING_MAX constraints 
per DPC.
- We keep a conservative estimate of how large Ring->PacketQueue is growing.
- We expect that Ring->PacketQueue should be large enough as an imtermediate 
buffer to empty the entire ring, and that in cases whether throttling is underway, 
the packet queue will grow whilst waiting for PacketComplete to empty.
- We allow PacketQueue to grow up to a certain size: also RX_BUFFERING_MAX, so 
it is possible to empty the entire queue to PacketComplete and indicate all of 
PacketComplete up to NDIS in one go.
- If PacketQueue is about to grow larger than that, then we leave the PollDPC 
early, so it does not fill up the packet queue, and we allow the ring itself to 
back up.

Now, of course, the reason this works at all is because we have two DPC's: 
PollDPC and QueueDPC. The draining from the packet queue to packet complete, 
and indicate up to NDIS is in the  QueueDpc, and the filling of the packet 
queue is in the PollDPC. It is the different rates of these two DPC's that 
indicates whether the PacketQueue will fill or empty.

The motivation for the two was so that we could use a threaded DPC for the 'heavier' work but still clear the ring quickly so TBH you could more easily apply back-pressure on the ring by dropping to a single threaded DPC to do both jobs.


I think the part below should now be split out into a separate function now; 
it's all getting a bit long to be inline. You might also
consider a separate DPC that just does this second part so you can just use 
that in the case you need to defer.

-    while (!IsListEmpty(&Ring->PacketComplete)) {
+    while ((!IsListEmpty(&Ring->PacketComplete))
+        && ((PushedUpstream < RX_BUFFERING_MAX) || Ring->FinalFlush))
+ {

I am loath to create another DPC. Refactoring is all fine, but creating another 
DPC changes the runtime behaviour.

I have omitted to upstream a patch which we have locally which has a large 
amount of stats on relative DPC rates, and also how much the two DPC's overlap 
each other (that is, execute concurrently).

Agree that we need to break ReceiverRingSwizzle into a couple of separate functions, 
one dealing with transferring PacketQueue -> PacketComplete, and one dealing with 
PacketComplete -> Indicate up to NDIS. Adding another DPC totally invalidates all 
the profiling etc I have done.

Given the proximity of the queuing / dpc logic to XenVif, and the lack of 
upstream signals from Ndis to XenNet (correct me if I am wrong), I can't see a 
way to easily push this logic up to XenNet.

Let me know if you're happy with this approach.


Not really. How about the single DPC idea?

  Paul

MH.





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.