|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Issues with the device eject path in XenVif
On 13/04/2026 14:41, Paul Durrant wrote: > On 09/04/2026 16:29, Tu Dinh wrote: >> Hi all, >> >> I'm currently trying to fix some lingering issues with VIF unplug, which >> will let me replace the MRSW lock with a simpler and faster >> implementation. >> >> Pdo->Eject/PdoRequestEject (e.g. in XenVif) is signaled by the >> FrontendEject worker thread, which watches backend/vif/DOMID/X/online >> among a few other things. I've run into several issues with this code >> path: >> >> - When removing the VIF using `xe vif-unplug force=true`, the entire >> xenstore key of the backend is removed without a chance to tear down the >> connection. However, the watch on BACKEND/online will be triggered >> before the watch on device/vif, which causes the PDO to be marked as >> ejected, and so goes through the QUERY_REMOVE_DEVICE/REMOVE_DEVICE >> instead of being a surprise removal. >> - In the REMOVE_DEVICE case, NDIS will wait for packets to be returned >> before continuing. Yet we cannot make progress because the backend has >> already disappeared, so the system will hang. This can be reproduced by >> doing an unplug with force=true while having some outbound traffic, but >> the timing is quite tight with the current code. >> - BACKEND/online is an internal, backend-specific value that is not >> documented in xenstore-paths or netif.h. So frontends should not use >> this value. I also find converting a VIF unplug to a query remove based >> on reading BACKEND/online somewhat dubious. >> >> I've considered several options for a fix, which I have documented below: >> >> 1. Make FrontendIsBackendOnline return a status code if BACKEND/online >> doesn't exist, and treat an error to read the key as a surprise removal. >> - This ends up being unworkable, since QEMU will always first set >> BACKEND/online to 0 even if the VIF is being force-unplugged. > > I still think this is the right way to deal with force unplug. Is there > a tell-tale you can look for to see if it is forced? (E.g. has the > frontend xenstore area completely gone?) > What I observe during a force VIF unplug is an unplug request (BACKEND/online=0 / PdoRequestEject) shortly followed by the backend being wiped out. I couldn't find any tell I could use to distinguish the force unplug case from the normal one. Maybe it can be fixed by attaching the watchdog thread's event to a watch on the backend, then (for transmitters) faking responses in the watchdog thread if we detect that the backend has disappeared. >> >> 2. Make FrontendIsBackendOnline check the backend's existence (i.e. >> reading the backend key instead of backend/online). >> - This changes the unplug order slightly, but looks like the cleanest >> solution. Though I'm not sure if it breaks cancelling of device removal >> requests. >> >> 3. Remove the eject codepath and rely on FdoScan instead. >> - This might break a few things that assume the presence of this >> codepath. >> >> I'd be glad to hear your opinions on this matter. >> >> Thanks, >> >> >> -- >> Ngoc Tu Dinh | Vates XCP-ng Developer >> >> XCP-ng & Xen Orchestra - Vates solutions >> >> web: https://vates.tech >> >> > > -- Ngoc Tu Dinh | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |