[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH] xen-netback: Check for hotplug-status existence before watching

> -----Original Message-----
> From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> Sent: 10 May 2021 20:43
> To: Michael Brown <mbrown@xxxxxxxxxxxxxxxx>; paul@xxxxxxx
> Cc: paul@xxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; 
> wei.liu@xxxxxxxxxx; Durrant,
> Paul <pdurrant@xxxxxxxxxxxx>
> Subject: RE: [EXTERNAL] [PATCH] xen-netback: Check for hotplug-status 
> existence before watching
> On Mon, May 10, 2021 at 08:06:55PM +0100, Michael Brown wrote:
> > If you have a suggested patch, I'm happy to test that it doesn't reintroduce
> > the regression bug that was fixed by this commit.
> Actually, I've just tested with a simple reloading xen-netfront module. It
> seems in this case, the hotplug script is not re-executed. In fact, I
> think it should not be re-executed at all, since the vif interface
> remains in place (it just gets NO-CARRIER flag).
> This brings a question, why removing hotplug-status in the first place?
> The interface remains correctly configured by the hotplug script after
> all. From the commit message:
>     xen-netback: remove 'hotplug-status' once it has served its purpose
>     Removing the 'hotplug-status' node in netback_remove() is wrong; the 
> script
>     may not have completed. Only remove the node once the watch has fired and
>     has been unregistered.
> I think the intention was to remove 'hotplug-status' node _later_ in
> case of quickly adding and removing the interface. Is that right, Paul?

The removal was done to allow unbind/bind to function correctly. IIRC before 
the original patch doing a bind would stall forever waiting for the hotplug 
status to change, which would never happen.

> In that case, letting hotplug_status_changed() remove the entry wont
> work, because the watch was unregistered few lines earlier in
> netback_remove(). And keeping the watch is not an option, because the
> whole backend_info struct is going to be free-ed already.
> If my guess about the original reason for the change is right, I think
> it should be fixed at the hotplug script level - it should check if the
> device is still there before writing 'hotplug-status' node.
> I'm not sure if doing it race-free is possible from a shell script (I think it
> requires doing xenstore read _and_ write in a single transaction). But
> in the worst case, the aftermath of loosing the race is leaving stray
> 'hotplug-status' xenstore node - not ideal, but also less harmful than
> failing to bring up an interface. At this point, the toolstack could cleanup
> it later, perhaps while setting up that interface again (if it gets
> re-connected)?
> Anyway, perhaps the best thing to do now, is to revert both commits, and
> think of an alternative solution for the original issue? That of course
> assumes I guessed correctly why it was done in the first place...

Simply reverting everything would likely break the ability to do unbind and 
bind (which is useful e.g to allow update the netback module whilst guests are 
still running) so I don't think that's an option.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.