[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen-netback hotplug-status regression bug



On 13/04/2021 11:48, Michael Brown wrote:
On 13/04/2021 08:12, Paul Durrant wrote:
If the frontend subsequently disconnects and reconnects (e.g. transitions through Closed->Initialising->Connected) then:

- Nothing recreates "hotplug-status"

- When the frontend re-enters Connected state, connect() sets up a watch on "hotplug-status" again

- The callback hotplug_status_changed() is never triggered, and so the backend device never transitions to Connected state.

That's not how I read it. Given that "hotplug-status" is removed by the call to hotplug_status_changed() then the next call to connect() should fail to register the watch and 'have_hotplug_status_watch' should be 0. Thus backend_switch_state() should not defer the transition to XenbusStateConnected in any subsequent interaction with the frontend.

Thank you for the reply.  I've tested and confirmed my initial hypothesis: the call to xenbus_watch_pathfmt() succeeds even if the node does not exist.

I confirmed this with ftrace using:

  cd /sys/kernel/debug/tracing
  echo function_graph > current_tracer
  echo set_backend_state > set_ftrace_filter
  echo xenbus_watch_pathfmt >> set_ftrace_filter
  echo register_xenbus_watch >> set_ftrace_filter
  echo xenbus_dev_fatal >> set_ftrace_filter

On the second time that the frontend transitions to Connected, this produced the trace:

  set_backend_state [xen_netback]() {
    register_xenbus_watch();
    register_xenbus_watch();
    xenbus_watch_pathfmt() {
      register_xenbus_watch();
    }
  }

which seems to confirm that the error path in xenbus_watch_path() is *not* taken, i.e. that the call to register_xenbus_watch() succeeded even though the node did not exist.


Other observations also seem to confirm this behaviour:

- Running "xenstore ls" in dom0 confirms that on the second frontend transition to Connected, the frontend state is indeed Connected (4) but the backend state remains in InitWait (2)

- Running "xenstore watch /local/domain/0/backend/vif/<domU>/0/hotplug-status" *before* starting the domU confirms that it is possible to create a watch on a node that does not (yet) exist, and that the watch *is* notified when the node is later created.

Are you seeing the watch successfully re-registered even though the node does not exist? Perhaps there has been a change in xenstore behaviour?

So, the TL;DR is that yes, the watch does successfully register even though the node does not exist.

From a quick look through the xenstored source, it looks as though the only check on the node name is the call to is_valid_nodename(), which seems to perform a syntactic validity check only.  I can't immediately find any commit that would have changed this behaviour.


Ok, so it sound like this was probably my misunderstanding of xenstore semantics in the first place (although I'm sure I remember watch registration failing for non-existent nodes at some point in the past... that may have been with a non-upstream version of oxenstored though).

Anyway... a reasonable fix would therefore be to read the node first and only register the watch if it does exist.

  Paul

Thanks,

Michael




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.