|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Issues with the device eject path in XenVif
Hi all, I'm currently trying to fix some lingering issues with VIF unplug, which will let me replace the MRSW lock with a simpler and faster implementation. Pdo->Eject/PdoRequestEject (e.g. in XenVif) is signaled by the FrontendEject worker thread, which watches backend/vif/DOMID/X/online among a few other things. I've run into several issues with this code path: - When removing the VIF using `xe vif-unplug force=true`, the entire xenstore key of the backend is removed without a chance to tear down the connection. However, the watch on BACKEND/online will be triggered before the watch on device/vif, which causes the PDO to be marked as ejected, and so goes through the QUERY_REMOVE_DEVICE/REMOVE_DEVICE instead of being a surprise removal. - In the REMOVE_DEVICE case, NDIS will wait for packets to be returned before continuing. Yet we cannot make progress because the backend has already disappeared, so the system will hang. This can be reproduced by doing an unplug with force=true while having some outbound traffic, but the timing is quite tight with the current code. - BACKEND/online is an internal, backend-specific value that is not documented in xenstore-paths or netif.h. So frontends should not use this value. I also find converting a VIF unplug to a query remove based on reading BACKEND/online somewhat dubious. I've considered several options for a fix, which I have documented below: 1. Make FrontendIsBackendOnline return a status code if BACKEND/online doesn't exist, and treat an error to read the key as a surprise removal. - This ends up being unworkable, since QEMU will always first set BACKEND/online to 0 even if the VIF is being force-unplugged. 2. Make FrontendIsBackendOnline check the backend's existence (i.e. reading the backend key instead of backend/online). - This changes the unplug order slightly, but looks like the cleanest solution. Though I'm not sure if it breaks cancelling of device removal requests. 3. Remove the eject codepath and rely on FdoScan instead. - This might break a few things that assume the presence of this codepath. I'd be glad to hear your opinions on this matter. Thanks, -- Ngoc Tu Dinh | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |