[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH 6/6] Under conditions of high load, it was possible for xenvif to queue up very large numbers of packets before pushing them all upstream in one go. This was partially ameliorated by use of the NDIS_RECEIVE_FLAGS_RESOURCES flag.
-----Original Message----- From: win-pv-devel <win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx> On Behalf Of Paul Durrant Sent: 22 July 2021 13:44 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH 6/6] Under conditions of high load, it was possible for xenvif to queue up very large numbers of packets before pushing them all upstream in one go. This was partially ameliorated by use of the NDIS_RECEIVE_FLAGS_RESOURCES flag. > I think you first ought to make a new version of > VifReceiverQueuePacket() that can return a boolean value to say whether you > should queue more or not. Then XENNET can be the one > to apply the > backpressure (as it should be). Ah. Okay, think this is possibly due to a misnomer. I need to explain why I don't think that will work: As I understand it, the problem is that there aren't actually any explicit flow control signals from above. Perhaps I should have called it "Throttling" instead of backpressure, and at the moment, that throttling is internal to XenVif, because there's no way of knowing exactly how much stress/resource scarcity higher levels of the stack are under. The way it's currently implemented is: - From Ring->PacketComplete, we allow RX_BUFFERING_MAX to be pushed upsteam per DPC. - If that fails to empty Ring->PacketComplete, we immediately schedule another DPC, this allowing other DPC's to run first, which is assumed to give higher layers a chance to digest the packets pushed up, and/or free up resources when those packets have been completed. - Upstream of that, we are holding Ring->PacketQueue, which gets cleared and moved to Ring->PacketComplete via ReceiverRingProcessPacket. - This transfer of Ring->PacketQueue to Ring->PacketComplete will only happen when Ring->PacketComplete has been emptied, that subject to RX_BUFFERING_MAX constraints per DPC. - We keep a conservative estimate of how large Ring->PacketQueue is growing. - We expect that Ring->PacketQueue should be large enough as an imtermediate buffer to empty the entire ring, and that in cases whether throttling is underway, the packet queue will grow whilst waiting for PacketComplete to empty. - We allow PacketQueue to grow up to a certain size: also RX_BUFFERING_MAX, so it is possible to empty the entire queue to PacketComplete and indicate all of PacketComplete up to NDIS in one go. - If PacketQueue is about to grow larger than that, then we leave the PollDPC early, so it does not fill up the packet queue, and we allow the ring itself to back up. Now, of course, the reason this works at all is because we have two DPC's: PollDPC and QueueDPC. The draining from the packet queue to packet complete, and indicate up to NDIS is in the QueueDpc, and the filling of the packet queue is in the PollDPC. It is the different rates of these two DPC's that indicates whether the PacketQueue will fill or empty. > I think the part below should now be split out into a separate function now; > it's all getting a bit long to be inline. You might also > consider a separate DPC that just does this second part so you can just use > that in the case you need to defer. > - while (!IsListEmpty(&Ring->PacketComplete)) { > + while ((!IsListEmpty(&Ring->PacketComplete)) > + && ((PushedUpstream < RX_BUFFERING_MAX) || Ring->FinalFlush)) > + { I am loath to create another DPC. Refactoring is all fine, but creating another DPC changes the runtime behaviour. I have omitted to upstream a patch which we have locally which has a large amount of stats on relative DPC rates, and also how much the two DPC's overlap each other (that is, execute concurrently). Agree that we need to break ReceiverRingSwizzle into a couple of separate functions, one dealing with transferring PacketQueue -> PacketComplete, and one dealing with PacketComplete -> Indicate up to NDIS. Adding another DPC totally invalidates all the profiling etc I have done. Given the proximity of the queuing / dpc logic to XenVif, and the lack of upstream signals from Ndis to XenNet (correct me if I am wrong), I can't see a way to easily push this logic up to XenNet. Let me know if you're happy with this approach. MH.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |