[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issues with the device eject path in XenVif


  • To: "Paul Durrant" <xadimgnik@xxxxxxxxx>, win-pv-devel@xxxxxxxxxxxxxxxxxxxx
  • From: "Tu Dinh" <ngoc-tu.dinh@xxxxxxxxxx>
  • Date: Mon, 13 Apr 2026 13:22:48 +0000
  • Authentication-results: eu.smtp.expurgate.cloud; dkim=pass header.s=mte1 header.d=mandrillapp.com header.i="@mandrillapp.com" header.h="From:Subject:Message-Id:To:References:In-Reply-To:Feedback-ID:Date:MIME-Version:Content-Type:Content-Transfer-Encoding"; dkim=pass header.s=mte1 header.d=vates.tech header.i="ngoc-tu.dinh@xxxxxxxxxx" header.h="From:Subject:Message-Id:To:References:In-Reply-To:Feedback-ID:Date:MIME-Version:Content-Type:Content-Transfer-Encoding"
  • Delivery-date: Mon, 13 Apr 2026 13:22:54 +0000
  • Feedback-id: 30504962:30504962.20260413:md
  • List-id: Developer list for the Windows PV Drivers subproject <win-pv-devel.lists.xenproject.org>

On 13/04/2026 14:41, Paul Durrant wrote:
> On 09/04/2026 16:29, Tu Dinh wrote:
>> Hi all,
>>
>> I'm currently trying to fix some lingering issues with VIF unplug, which
>> will let me replace the MRSW lock with a simpler and faster
>> implementation.
>>
>> Pdo->Eject/PdoRequestEject (e.g. in XenVif) is signaled by the
>> FrontendEject worker thread, which watches backend/vif/DOMID/X/online
>> among a few other things. I've run into several issues with this code
>> path:
>>
>> - When removing the VIF using `xe vif-unplug force=true`, the entire
>> xenstore key of the backend is removed without a chance to tear down the
>> connection. However, the watch on BACKEND/online will be triggered
>> before the watch on device/vif, which causes the PDO to be marked as
>> ejected, and so goes through the QUERY_REMOVE_DEVICE/REMOVE_DEVICE
>> instead of being a surprise removal.
>> - In the REMOVE_DEVICE case, NDIS will wait for packets to be returned
>> before continuing. Yet we cannot make progress because the backend has
>> already disappeared, so the system will hang. This can be reproduced by
>> doing an unplug with force=true while having some outbound traffic, but
>> the timing is quite tight with the current code.
>> - BACKEND/online is an internal, backend-specific value that is not
>> documented in xenstore-paths or netif.h. So frontends should not use
>> this value. I also find converting a VIF unplug to a query remove based
>> on reading BACKEND/online somewhat dubious.
>>
>> I've considered several options for a fix, which I have documented below:
>>
>> 1. Make FrontendIsBackendOnline return a status code if BACKEND/online
>> doesn't exist, and treat an error to read the key as a surprise removal.
>>     - This ends up being unworkable, since QEMU will always first set
>> BACKEND/online to 0 even if the VIF is being force-unplugged.
>
> I still think this is the right way to deal with force unplug. Is there
> a tell-tale you can look for to see if it is forced? (E.g. has the
> frontend xenstore area completely gone?)
>

What I observe during a force VIF unplug is an unplug request
(BACKEND/online=0 / PdoRequestEject) shortly followed by the backend
being wiped out. I couldn't find any tell I could use to distinguish the
force unplug case from the normal one.

Maybe it can be fixed by attaching the watchdog thread's event to a
watch on the backend, then (for transmitters) faking responses in the
watchdog thread if we detect that the backend has disappeared.

>>
>> 2. Make FrontendIsBackendOnline check the backend's existence (i.e.
>> reading the backend key instead of backend/online).
>>     - This changes the unplug order slightly, but looks like the cleanest
>> solution. Though I'm not sure if it breaks cancelling of device removal
>> requests.
>>
>> 3. Remove the eject codepath and rely on FdoScan instead.
>>     - This might break a few things that assume the presence of this
>> codepath.
>>
>> I'd be glad to hear your opinions on this matter.
>>
>> Thanks,
>>
>>
>> --
>> Ngoc Tu Dinh | Vates XCP-ng Developer
>>
>> XCP-ng & Xen Orchestra - Vates solutions
>>
>> web: https://vates.tech
>>
>>
>
>



--
Ngoc Tu Dinh | Vates XCP-ng Developer

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.