|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Issues with the device eject path in XenVif
On 13/04/2026 14:43, Tu Dinh wrote: On 13/04/2026 15:35, Paul Durrant wrote:On 13/04/2026 14:22, Tu Dinh wrote:On 13/04/2026 14:41, Paul Durrant wrote:On 09/04/2026 16:29, Tu Dinh wrote:Hi all, I'm currently trying to fix some lingering issues with VIF unplug, which will let me replace the MRSW lock with a simpler and faster implementation. Pdo->Eject/PdoRequestEject (e.g. in XenVif) is signaled by the FrontendEject worker thread, which watches backend/vif/DOMID/X/online among a few other things. I've run into several issues with this code path: - When removing the VIF using `xe vif-unplug force=true`, the entire xenstore key of the backend is removed without a chance to tear down the connection. However, the watch on BACKEND/online will be triggered before the watch on device/vif, which causes the PDO to be marked as ejected, and so goes through the QUERY_REMOVE_DEVICE/REMOVE_DEVICE instead of being a surprise removal. - In the REMOVE_DEVICE case, NDIS will wait for packets to be returned before continuing. Yet we cannot make progress because the backend has already disappeared, so the system will hang. This can be reproduced by doing an unplug with force=true while having some outbound traffic, but the timing is quite tight with the current code. - BACKEND/online is an internal, backend-specific value that is not documented in xenstore-paths or netif.h. So frontends should not use this value. I also find converting a VIF unplug to a query remove based on reading BACKEND/online somewhat dubious. I've considered several options for a fix, which I have documented below: 1. Make FrontendIsBackendOnline return a status code if BACKEND/online doesn't exist, and treat an error to read the key as a surprise removal. - This ends up being unworkable, since QEMU will always first set BACKEND/online to 0 even if the VIF is being force-unplugged.I still think this is the right way to deal with force unplug. Is there a tell-tale you can look for to see if it is forced? (E.g. has the frontend xenstore area completely gone?)What I observe during a force VIF unplug is an unplug request (BACKEND/online=0 / PdoRequestEject) shortly followed by the backend being wiped out. I couldn't find any tell I could use to distinguish the force unplug case from the normal one. Maybe it can be fixed by attaching the watchdog thread's event to a watch on the backend, then (for transmitters) faking responses in the watchdog thread if we detect that the backend has disappeared.The PdoRequestEject is trigger off the state change watch though isn't it. In the case of force does the state still change to 'closing'? I'd have thought the node would be removed, in which case the state would go to 'unknown' instead.There's no watchdog thread waiting for BACKEND/state, and NDIS waits for packet return during initial handling of IRP_MN_REMOVE_DEVICE before xennet/xenvif is entered. So for now there's no opportunity for FrontendClose/FrontendWaitForBackendXenbusStateChange to be called in order to update the state to Unknown. Ok, that's the problem then. IIRC the eject protocol is setting online -> 0 and then setting state -> closing. So the eject should be driven by a watch on 'state' rather than a watch on 'online'. 2. Make FrontendIsBackendOnline check the backend's existence (i.e. reading the backend key instead of backend/online). - This changes the unplug order slightly, but looks like the cleanest solution. Though I'm not sure if it breaks cancelling of device removal requests. 3. Remove the eject codepath and rely on FdoScan instead. - This might break a few things that assume the presence of this codepath. I'd be glad to hear your opinions on this matter. Thanks, -- Ngoc Tu Dinh | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech-- Ngoc Tu Dinh | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech-- Ngoc Tu Dinh | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |