[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen-netback hotplug-status regression bug



On 10/04/2021 19:25, Michael Brown wrote:
Commit https://github.com/torvalds/linux/commit/1f25657 ("xen-netback: remove 'hotplug-status' once it has served its purpose") seems to have introduced a regression that prevents a vif frontend from transitioning more than once into Connected state.

As far as I can tell:

- The defined vif script (e.g. /etc/xen/scripts/vif-bridge) executes only once, at domU startup, and sets backend/vif/<domU>/0/hotplug-status="connected"

- When the frontend first enters Connected state, drivers/net/xen-netback/xenbus.c's connect() sets up a watch on "hotplug-status" with the callback function hotplug_status_changed()

- When hotplug_status_changed() is triggered by the watch, it transitions the backend to Connected state and calls xenbus_rm() to delete the "hotplug-status" attribute.

If the frontend subsequently disconnects and reconnects (e.g. transitions through Closed->Initialising->Connected) then:

- Nothing recreates "hotplug-status"

- When the frontend re-enters Connected state, connect() sets up a watch on "hotplug-status" again

- The callback hotplug_status_changed() is never triggered, and so the backend device never transitions to Connected state.


That's not how I read it. Given that "hotplug-status" is removed by the call to hotplug_status_changed() then the next call to connect() should fail to register the watch and 'have_hotplug_status_watch' should be 0. Thus backend_switch_state() should not defer the transition to XenbusStateConnected in any subsequent interaction with the frontend.



Reverting the commit would fix this bug, but would obviously also reintroduce the race condition that the commit was designed to avoid.

I'm happy to put together a patch, if one of the maintainers could suggest a sensible design approach.


Are you seeing the watch successfully re-registered even though the node does not exist? Perhaps there has been a change in xenstore behaviour?

  Paul

I'm not a list member, so please CC me directly on replies.

Thanks,

Michael




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.