[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH] xen-netback: Check for hotplug-status existence before watching



> -----Original Message-----
> From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> Sent: 10 May 2021 20:43
> To: Michael Brown <mbrown@xxxxxxxxxxxxxxxx>; paul@xxxxxxx
> Cc: paul@xxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; 
> wei.liu@xxxxxxxxxx; Durrant,
> Paul <pdurrant@xxxxxxxxxxxx>
> Subject: RE: [EXTERNAL] [PATCH] xen-netback: Check for hotplug-status 
> existence before watching
> 
> On Mon, May 10, 2021 at 08:06:55PM +0100, Michael Brown wrote:
> > If you have a suggested patch, I'm happy to test that it doesn't reintroduce
> > the regression bug that was fixed by this commit.
> 
> Actually, I've just tested with a simple reloading xen-netfront module. It
> seems in this case, the hotplug script is not re-executed. In fact, I
> think it should not be re-executed at all, since the vif interface
> remains in place (it just gets NO-CARRIER flag).
> 
> This brings a question, why removing hotplug-status in the first place?
> The interface remains correctly configured by the hotplug script after
> all. From the commit message:
> 
>     xen-netback: remove 'hotplug-status' once it has served its purpose
> 
>     Removing the 'hotplug-status' node in netback_remove() is wrong; the 
> script
>     may not have completed. Only remove the node once the watch has fired and
>     has been unregistered.
> 
> I think the intention was to remove 'hotplug-status' node _later_ in
> case of quickly adding and removing the interface. Is that right, Paul?

The removal was done to allow unbind/bind to function correctly. IIRC before 
the original patch doing a bind would stall forever waiting for the hotplug 
status to change, which would never happen.

> In that case, letting hotplug_status_changed() remove the entry wont
> work, because the watch was unregistered few lines earlier in
> netback_remove(). And keeping the watch is not an option, because the
> whole backend_info struct is going to be free-ed already.
> 
> If my guess about the original reason for the change is right, I think
> it should be fixed at the hotplug script level - it should check if the
> device is still there before writing 'hotplug-status' node.
> I'm not sure if doing it race-free is possible from a shell script (I think it
> requires doing xenstore read _and_ write in a single transaction). But
> in the worst case, the aftermath of loosing the race is leaving stray
> 'hotplug-status' xenstore node - not ideal, but also less harmful than
> failing to bring up an interface. At this point, the toolstack could cleanup
> it later, perhaps while setting up that interface again (if it gets
> re-connected)?
> 
> Anyway, perhaps the best thing to do now, is to revert both commits, and
> think of an alternative solution for the original issue? That of course
> assumes I guessed correctly why it was done in the first place...
> 

Simply reverting everything would likely break the ability to do unbind and 
bind (which is useful e.g to allow update the netback module whilst guests are 
still running) so I don't think that's an option.

  Paul

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.