[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xen-netback hotplug-status regression bug



On 13/04/2021 08:12, Paul Durrant wrote:
If the frontend subsequently disconnects and reconnects (e.g. transitions through Closed->Initialising->Connected) then:

- Nothing recreates "hotplug-status"

- When the frontend re-enters Connected state, connect() sets up a watch on "hotplug-status" again

- The callback hotplug_status_changed() is never triggered, and so the backend device never transitions to Connected state.

That's not how I read it. Given that "hotplug-status" is removed by the call to hotplug_status_changed() then the next call to connect() should fail to register the watch and 'have_hotplug_status_watch' should be 0. Thus backend_switch_state() should not defer the transition to XenbusStateConnected in any subsequent interaction with the frontend.

Thank you for the reply. I've tested and confirmed my initial hypothesis: the call to xenbus_watch_pathfmt() succeeds even if the node does not exist.

I confirmed this with ftrace using:

  cd /sys/kernel/debug/tracing
  echo function_graph > current_tracer
  echo set_backend_state > set_ftrace_filter
  echo xenbus_watch_pathfmt >> set_ftrace_filter
  echo register_xenbus_watch >> set_ftrace_filter
  echo xenbus_dev_fatal >> set_ftrace_filter

On the second time that the frontend transitions to Connected, this produced the trace:

  set_backend_state [xen_netback]() {
    register_xenbus_watch();
    register_xenbus_watch();
    xenbus_watch_pathfmt() {
      register_xenbus_watch();
    }
  }

which seems to confirm that the error path in xenbus_watch_path() is *not* taken, i.e. that the call to register_xenbus_watch() succeeded even though the node did not exist.


Other observations also seem to confirm this behaviour:

- Running "xenstore ls" in dom0 confirms that on the second frontend transition to Connected, the frontend state is indeed Connected (4) but the backend state remains in InitWait (2)

- Running "xenstore watch /local/domain/0/backend/vif/<domU>/0/hotplug-status" *before* starting the domU confirms that it is possible to create a watch on a node that does not (yet) exist, and that the watch *is* notified when the node is later created.

Are you seeing the watch successfully re-registered even though the node does not exist? Perhaps there has been a change in xenstore behaviour?

So, the TL;DR is that yes, the watch does successfully register even though the node does not exist.

From a quick look through the xenstored source, it looks as though the only check on the node name is the call to is_valid_nodename(), which seems to perform a syntactic validity check only. I can't immediately find any commit that would have changed this behaviour.

Thanks,

Michael



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.