[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] XEN: bug in vif-bridge script



Ian,
thanks for your (rightly predicted and expected) reply. Comments further below inline.

Atom2

Am 05.03.14 03:00, schrieb Ian Campbell:
(Roger, I've trimmed the quotes fairly aggressively,
https://bugs.gentoo.org/show_bug.cgi?id=502570 or
http://lists.xen.org/archives/html/xen-users/2014-03/msg00013.html for
the full thing but in brief the vif is gone by the time the hotplug
script runs and this results in errors from e.g. brctl delif, which are
correctly ignored but are also logged. I presume Atom2 is running a log
scanning tool or something and would like to avoid spurious log
messages, which seems fair)
You are right, I am running a log scanner and the messages were rather disturbing ...

On Tue, 2014-03-04 at 17:06 +0100, Atom2 wrote:
If this rather needs to go to the xen-devel ML, I am sure Ian Campbell
(or somebody else) will shortly be around and move it or asks me to
resend to the other list.
[...]
Feb 26 22:14:29 vm-host logger: /etc/xen/scripts/vif-bridge: brctl delif xenbr0 
vif1.0 failed
Feb 26 22:14:29 vm-host logger: /etc/xen/scripts/vif-bridge: ifconfig vif1.0 
down failed
[...]
Upon investigating it seems that the problem is related to the fact that
the network device (at least for paravirtualized guests using the
netfront/netback device model) has already been destroyed by the dom0
kernel when the script is being run.

This sounds very plausible to me.

Are you using the xm or xl toolstack? The way the new xl toolstack
handles hotplug scripts ought to be a lot less prone to this sort of
race (but I don't know if it avoids this particular one). Roger, do you
have any thoughts?
I am using the xl toolstack.

Suggested fix:
for brctl: check whether the interface still exists and is also still
linked to the bridge prior to invoking the brctl command
for ifconfig: check whether the interface still exists and is also still
up prior to invoking the ifconfig command as follows:
-------------------------------------------
case "$command" in
      online)
          setup_virtual_bridge_port "$dev"
          mtu="`ip link show $bridge | awk '/mtu/ { print $5 }'`"
          if [ -n "$mtu" ] && [ "$mtu" -gt 0 ]
          then
                  ip link set $dev mtu $mtu || :
          fi
          add_to_bridge "$bridge" "$dev"
          ;;

      offline)
          if brctl show "$bridge" | grep "$dev" > /dev/null 2>&1 ; then
              do_without_error brctl delif "$bridge" "$dev"
        fi
          if ifconfig -s "$dev" > /dev/null 2>&1 ; then
              do_without_error ifconfig "$dev" down
        fi
          ;;

      add)
          setup_virtual_bridge_port "$dev"
          add_to_bridge "$bridge" "$dev"
          ;;
esac

If this issue does affect xl then I would like to see this fixed
upstream, preferably by fixing xl to not race hotplug scripts against
device tear down. If that is impossible (I don't think it should be, but
Roger?) then the script change which you propose seems like a very
reasonable fallback option.

If it is xend only (IOW xl sequences things correctly) then I'm not sure
we want to make the scripts more complex for the xend case only.

-------------------------------------------


In terms of functionality my suggested fix does not change anything as
in case the interface is still linked to the bridge (is still up) -
which might be the case for PCI-passed through devices from dom0 to a

PCI-passthrough devices effectively don't exist in dom0 (they cannot be
in both dom0 and domU) -- so they can't be on a bridge in dom0.
Thanks for claryfing this and it does indeed make a lot of sense now that you say it. I was mislead by the fact that I thought there must be a case where the vif device still exists and is also up as otherwise the "brctl delif" and "ifconfig down" parts of the script would not make any sense at all - unless it really is only a race condition. But in my case the script never wins - the error message came up consistently.

domU - the removal from the bridge (bringing the interface down) is
performed exactly as before. It however does away the nasty error
message in the syslog.
====== End of Bug report and suggested fix =======


Thanks and regards,

Atom2

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.