[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] XEN: bug in vif-bridge script



Hallo group,
I am using gentoo (kernel 3.11.7) together with xen 4.3.1. I have found a bug in the vif-bridge script which I reported to the gentoo bugzilla. Ian Delaney, the maintainer of the gentoo xen-packages (on copy here), suggested to bring this to the the attention of the xen ML as the fix should benefit other distributions as well.

The bug report (together with a suggested fix further below) is also available on https://bugs.gentoo.org/show_bug.cgi?id=502570, but I have included the relevant bits and pieces here for convenience and for you guys to be able to comment if and when required.

If this rather needs to go to the xen-devel ML, I am sure Ian Campbell (or somebody else) will shortly be around and move it or asks me to resend to the other list.


====== Start of Bug report and suggested fix =======
Upon shutting down a domU under XEN the script "/etc/xen/scripts/vif-bridge" is invoked with an "offline" argument. This is for the recommended setup of connecting domUs to the dom0 through a bridged device named xenbr0. The relevant snippet of code reads as follows:
-------------------------------------------
case "$command" in
    online)
        setup_virtual_bridge_port "$dev"
        mtu="`ip link show $bridge | awk '/mtu/ { print $5 }'`"
        if [ -n "$mtu" ] && [ "$mtu" -gt 0 ]
        then
                ip link set $dev mtu $mtu || :
        fi
        add_to_bridge "$bridge" "$dev"
        ;;

    offline)
        do_without_error brctl delif "$bridge" "$dev"
        do_without_error ifconfig "$dev" down
        ;;

    add)
        setup_virtual_bridge_port "$dev"
        add_to_bridge "$bridge" "$dev"
        ;;
esac
-------------------------------------------


The function "do_without error" called from the "offline)" pattern in the "case" statement is defined in /etc/xen/scripts/xen-hotplug-common.sh which is indirectly sourced through /etc/xen/scripts/vif-common.sh and reads as follows:
-------------------------------------------
do_without_error() {
  "$@" 2>/dev/null || log debug "$@ failed"
}
-------------------------------------------


The call 'do_without_error brctl delif "$bridge" "$dev"' obviously executes
    brctl delif "$bridge" "$dev"
and the call 'do_without_error brctl delif "$bridge" "$dev"' executes
    ifconfig "$dev" down
- both discarding any error output, but in case of any error (i.e. exit code <> 0) still logging a failed message to syslog as follows:
-------------------------------------------
Feb 26 22:14:29 vm-host logger: /etc/xen/scripts/vif-bridge: brctl delif xenbr0 vif1.0 failed Feb 26 22:14:29 vm-host logger: /etc/xen/scripts/vif-bridge: ifconfig vif1.0 down failed
-------------------------------------------


Upon investigating it seems that the problem is related to the fact that the network device (at least for paravirtualized guests using the netfront/netback device model) has already been destroyed by the dom0 kernel when the script is being run. This is evidenced by the following entries in syslog preceding the above quoted error messages:
-------------------------------------------
Feb 26 22:14:29 vm-host kernel: [ 6169.989895] xenbr0: port 1(vif1.0) entered disabled state Feb 26 22:14:29 vm-host kernel: [ 6170.007496] xenbr0: port 1(vif1.0) entered disabled state Feb 26 22:14:29 vm-host kernel: [ 6170.007568] device vif1.0 left promiscuous mode Feb 26 22:14:29 vm-host kernel: [ 6170.007571] xenbr0: port 1(vif1.0) entered disabled state
-------------------------------------------


These findings are further underpinned by the relevant error messages provided by the function "do_without_error" (captured by redirecting stderr to a file rather than to /dev/null) which are as follows:
-------------------------------------------
for brctl: "interface vif1.0 does not exist!"
for ifconfig: "vif1.0: ERROR while getting interface flags: No such device"
-------------------------------------------



Suggested fix:
for brctl: check whether the interface still exists and is also still linked to the bridge prior to invoking the brctl command for ifconfig: check whether the interface still exists and is also still up prior to invoking the ifconfig command as follows:
-------------------------------------------
case "$command" in
    online)
        setup_virtual_bridge_port "$dev"
        mtu="`ip link show $bridge | awk '/mtu/ { print $5 }'`"
        if [ -n "$mtu" ] && [ "$mtu" -gt 0 ]
        then
                ip link set $dev mtu $mtu || :
        fi
        add_to_bridge "$bridge" "$dev"
        ;;

    offline)
        if brctl show "$bridge" | grep "$dev" > /dev/null 2>&1 ; then
            do_without_error brctl delif "$bridge" "$dev"
        fi
        if ifconfig -s "$dev" > /dev/null 2>&1 ; then
            do_without_error ifconfig "$dev" down
        fi
        ;;

    add)
        setup_virtual_bridge_port "$dev"
        add_to_bridge "$bridge" "$dev"
        ;;
esac
-------------------------------------------


In terms of functionality my suggested fix does not change anything as in case the interface is still linked to the bridge (is still up) - which might be the case for PCI-passed through devices from dom0 to a domU - the removal from the bridge (bringing the interface down) is performed exactly as before. It however does away the nasty error message in the syslog.
====== End of Bug report and suggested fix =======


Thanks and regards,

Atom2

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.