[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 1/6] In some cases, the frontend may stop processing Tx ring requests in a timely manner. If this happens at the same time as ring disable, then the Tx code could spin forever at dispatch IRQL.
- To: <win-pv-devel@xxxxxxxxxxxxxxxxxxxx>
- From: Martin Harvey <Martin.Harvey@xxxxxxxxxx>
- Date: Tue, 20 Jul 2021 14:29:46 +0100
- Authentication-results: esa6.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none
- Cc: Martin Harvey <Martin.Harvey@xxxxxxxxxx>, Martin Harvey <martin.harvey@xxxxxxxxxx>
- Delivery-date: Tue, 20 Jul 2021 13:30:28 +0000
- Ironport-hdrordr: A9a23:3AOg1K2V/7Fyv6aLERBePQqjBLwkLtp133Aq2lEZdPRUGvb4qy nIpoV96faUskdpZJhOo6HiBEDtexzhHP1OkO0s1NWZLWvbUQKTRekIh+aP/9SJIVyZygc378 ddmsZFZuEYdWIK6PrH3A==
- Ironport-sdr: Kq2BaNtQe68iKhpUeZf9PLjSSYgBDVlhfbP4F8sA2z9+r2X3SlY8pYNkJuppOs1CQ+YqMbn8IR KWX+FlqFVR3KuosuSNP7iaNHGfmc8mMBGlaRwoEznHiIyvtAdOy6WjnmizMJf5dJh+QQUYwjRp FZk+O8a57LIkQLRp5bd+BTvE6WvysqnZNbGGnvqt1fBjZGeKuqQhfPmfj+T7xD9589+KIMKTkl F7jbG3IZTN7WYQgsRxAO64eNuru8/r35gCZ+z7GH+hShjwOqz5APOT31dGBVJUtbsKlxqVD7KZ XJyGl0KY+ua63aUjdBIvgBSC
- List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>
This patch introduces a hard limit on how long the code will spin,
allowing a device disable or power transition to complete, albeit
at the cost of Tx requests being dropped or the ring being in
an indeterminate state. This has normally been seen at guest shutdown
where final ring state is of little consequence.
Signed-off-by: Martin Harvey <martin.harvey@xxxxxxxxxx>
---
src/xenvif/transmitter.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/src/xenvif/transmitter.c b/src/xenvif/transmitter.c
index f6935a6..e114061 100644
--- a/src/xenvif/transmitter.c
+++ b/src/xenvif/transmitter.c
@@ -3933,6 +3933,8 @@ __TransmitterRingDisable(
XenbusState State;
ULONG Attempt;
NTSTATUS status;
+ BOOLEAN ToProcess;
+ BOOLEAN Abort;
Transmitter = Ring->Transmitter;
Frontend = Transmitter->Frontend;
@@ -3985,24 +3987,31 @@ __TransmitterRingDisable(
}
Attempt = 0;
+ ToProcess = Ring->ResponsesProcessed != Ring->RequestsPushed;
+ Abort = ((Attempt >= 100) || (State != XenbusStateConnected));
+
ASSERT3U(Ring->RequestsPushed, ==, Ring->RequestsPosted);
- while (Ring->ResponsesProcessed != Ring->RequestsPushed) {
+ while (ToProcess && !Abort) {
Attempt++;
- ASSERT(Attempt < 100);
+
+ KeStallExecutionProcessor(1000); // 1ms
// Try to move things along
__TransmitterRingSend(Ring);
(VOID) TransmitterRingPoll(Ring);
- if (State != XenbusStateConnected)
- __TransmitterRingFakeResponses(Ring);
-
// We are waiting for a watch event at DISPATCH_LEVEL so
// it is our responsibility to poll the store ring.
XENBUS_STORE(Poll,
&Transmitter->StoreInterface);
- KeStallExecutionProcessor(1000); // 1ms
+ ToProcess = Ring->ResponsesProcessed != Ring->RequestsPushed;
+ Abort = ((Attempt >= 100) || (State != XenbusStateConnected));
+ }
+ if (Abort)
+ {
+ __TransmitterRingFakeResponses(Ring);
+ (VOID) TransmitterRingPoll(Ring);
}
Ring->Enabled = FALSE;
--
2.25.0.windows.1
- Follow-Ups:
- Re: [PATCH 1/6] In some cases, the frontend may stop processing Tx ring requests in a timely manner. If this happens at the same time as ring disable, then the Tx code could spin forever at dispatch IRQL.
- Re: [PATCH 1/6] In some cases, the frontend may stop processing Tx ring requests in a timely manner. If this happens at the same time as ring disable, then the Tx code could spin forever at dispatch IRQL.
- Re: [PATCH 1/6] In some cases, the frontend may stop processing Tx ring requests in a timely manner. If this happens at the same time as ring disable, then the Tx code could spin forever at dispatch IRQL.
- [PATCH 3/6] In addition to preceding changes to ring disconnects, and associated logging, we also add some logging to check whether state change notifications are being sent in a timely manner between frontend and backend. Also a great assistance to customer debugging.
- [PATCH 2/6] Since Rx/Tx ring disconnects now no longer wait forever, we add some logging to catch those cases where the disconnect has timed out, indicating how many requests are still in the ring. This aids greatly with customer debugging.
- [PATCH 4/6] Under conditions of high load and low resources, it was possible for NDIS (in combination with overlying drivers) to send NET_BUFFER_LIST structures containing NULL MDL's for transmission. This resulted in an immediate bugcheck.
- [PATCH 5/6] Reduce logging of Fdo->NotDisableable, in a similar manner to changes already made to xenbus.
- [PATCH 6/6] Under conditions of high load, it was possible for xenvif to queue up very large numbers of packets before pushing them all upstream in one go. This was partially ameliorated by use of the NDIS_RECEIVE_FLAGS_RESOURCES flag.
|