Xen project Mailing List

[Xen-users] Possible locking issue with network-bridge

From: Andrew Davidoff <davidoff@xxxxxxxxx>

Date: Sat, 12 Oct 2013 03:30:09 -0400

Delivery-date: Sat, 12 Oct 2013 07:31:36 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

Hi, I'm running into a problem that I think has uncovered an issue with how network-bridge is doing its locking. There's possibly a root cause to my problem that's lower-level than the potential locking issue, but it seems to me like I may have uncovered a locking issue either way. I just installed Xen 4.2.3-23.el6 on a Scientific Linux 6.4 server. Xen was installed from the CentOS xen4 repo installed by centos-release-xen. The server has two ethernet ports configured in an LACP bond. I have Xen configured to use network bridging. During boot, when xend was setting up bridging, the network link was going down and coming back up as bond0 was renamed pbond0, etc, but then it was dropping for good and the xend bridging setup was ending with this error: RTNETLINK answers: File exists I narrowed this down to the fact that network-bridge was running multiple times, and the instances were stomping on each other. It's possible that the fact that it is running multiple times is the root cause of my issues here, but even if it should only be getting called once, it seems that there's an issue with the call to claim_lock. claim_lock happens after the checks that would make network-bridge exit earlly: if [ "${bridge}" = "null" ] ; then return fi if [ `brctl show | wc -l` != 1 ]; then return fi if link_exists "$pdev"; then # The device is already up. return fi It seems that this is problematic in that if an instance of network-bridge starts as another is running and has already claimed the lock, but before the lock holder has created any bridges or turned up $pdev, the late-comer will wait for the first script to complete, get the lock for itself, then proceed to break networking. I moved the call to claim_lock to the beginning of op_start and dropped in calls to release_lock before the possible early-exit returns, and this seems to have solved the problem. Does this seem like the right thing to do? And either way, if network-bridge shouldn't be running more than once, what do you think might be causing that? At a glance I think it should just be getting called once, when XendPIF.py is first loaded, but maybe I'm overlooking something. Thanks. Andy _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.