[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH 6/6] Under conditions of high load, it was possible for xenvif to queue up very large numbers of packets before pushing them all upstream in one go. This was partially ameliorated by use of the NDIS_RECEIVE_FLAGS_RESOURCES flag.
On 23/07/2021 15:08, Martin Harvey wrote: -----Original Message----- From: win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Paul Durrant Sent: 22 July 2021 13:44 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH 6/6] Under conditions of high load, it was possible for xenvif to queue up very large numbers of packets before pushing them all upstream in one go. This was partially ameliorated by use of the NDIS_RECEIVE_FLAGS_RESOURCES flag.I think you first ought to make a new version of VifReceiverQueuePacket() that can return a boolean value to say whether you should queue more or not. Then XENNET can be the one > to apply the backpressure (as it should be).Ah. Okay, think this is possibly due to a misnomer. I need to explain why I don't think that will work: As I understand it, the problem is that there aren't actually any explicit flow control signals from above. Perhaps I should have called it "Throttling" instead of backpressure, and at the moment, that throttling is internal to XenVif, because there's no way of knowing exactly how much stress/resource scarcity higher levels of the stack are under. I guess the issue is I don't see why it is a problem for xenvif to accumulate a large quantity of data before pushing. As for flow control signals, yes, all we essentially have is the discrepancy between buffers pushed up to XENNET and those returned. This should be capped at 2048 because XENNET asserts 'low resources' at that point. The way it's currently implemented is: - From Ring->PacketComplete, we allow RX_BUFFERING_MAX to be pushed upsteam per DPC. Why do we want to limit here? Is the concern that we end up pushing so much we tip XENNET into the low resource condition? If so then that's exactly where a return value from VifReceiverQueuePacket() could help. XENNET essentially says 'stop giving me packets, I'm waiting for buffers back from the stack'. - If that fails to empty Ring->PacketComplete, we immediately schedule another DPC, this allowing other DPC's to run first, which is assumed to give higher layers a chance to digest the packets pushed up, and/or free up resources when those packets have been completed. Why is it assumed higher layers process in that fashion; they may well process in context of NdisMIndicateReceiveNetBufferLists(). More efficiency could potentially be had there by indicating more than one at once. (We used to do that but had to stop due to a bug in Server 2008 IIRC... so it may now be possible again). - Upstream of that, we are holding Ring->PacketQueue, which gets cleared and moved to Ring->PacketComplete via ReceiverRingProcessPacket. - This transfer of Ring->PacketQueue to Ring->PacketComplete will only happen when Ring->PacketComplete has been emptied, that subject to RX_BUFFERING_MAX constraints per DPC. - We keep a conservative estimate of how large Ring->PacketQueue is growing. - We expect that Ring->PacketQueue should be large enough as an imtermediate buffer to empty the entire ring, and that in cases whether throttling is underway, the packet queue will grow whilst waiting for PacketComplete to empty. - We allow PacketQueue to grow up to a certain size: also RX_BUFFERING_MAX, so it is possible to empty the entire queue to PacketComplete and indicate all of PacketComplete up to NDIS in one go. - If PacketQueue is about to grow larger than that, then we leave the PollDPC early, so it does not fill up the packet queue, and we allow the ring itself to back up. Now, of course, the reason this works at all is because we have two DPC's: PollDPC and QueueDPC. The draining from the packet queue to packet complete, and indicate up to NDIS is in the QueueDpc, and the filling of the packet queue is in the PollDPC. It is the different rates of these two DPC's that indicates whether the PacketQueue will fill or empty. The motivation for the two was so that we could use a threaded DPC for the 'heavier' work but still clear the ring quickly so TBH you could more easily apply back-pressure on the ring by dropping to a single threaded DPC to do both jobs. I think the part below should now be split out into a separate function now; it's all getting a bit long to be inline. You might also consider a separate DPC that just does this second part so you can just use that in the case you need to defer.- while (!IsListEmpty(&Ring->PacketComplete)) { + while ((!IsListEmpty(&Ring->PacketComplete)) + && ((PushedUpstream < RX_BUFFERING_MAX) || Ring->FinalFlush)) + {I am loath to create another DPC. Refactoring is all fine, but creating another DPC changes the runtime behaviour. I have omitted to upstream a patch which we have locally which has a large amount of stats on relative DPC rates, and also how much the two DPC's overlap each other (that is, execute concurrently). Agree that we need to break ReceiverRingSwizzle into a couple of separate functions, one dealing with transferring PacketQueue -> PacketComplete, and one dealing with PacketComplete -> Indicate up to NDIS. Adding another DPC totally invalidates all the profiling etc I have done. Given the proximity of the queuing / dpc logic to XenVif, and the lack of upstream signals from Ndis to XenNet (correct me if I am wrong), I can't see a way to easily push this logic up to XenNet. Let me know if you're happy with this approach. Not really. How about the single DPC idea? Paul MH.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |