[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Strange ARP problem in a bridged config
I'm having an odd issue that I think is related to arp, and I'm hoping that someone can help me figure out why it's happening... I am running nagios, monitoring a number of xen hosts. Yesterday, I rebooted several of the machines (the physical hosts, not virtual machines). Since then, nagios is sometimes reporting that the hosts are down because pings to them fail. Testing manually, I can see that this is the case. This problem is also occurring only on servers that are on the local subnet; servers on another subnet do not have cases where they lose connectivity. Checking arp on the nagios server, I discovered that the machines that were reporting down had entries like the following: xenhost9 ether FE:FF:FF:FF:FF:FF C eth0 When the machines become available again, the entry changes to look like this: xenhost9 ether 00:E0:81:40:2A:AE C eth0 So, it appears that the nagios server (and, on at least one occasion, another server on my network) is picking up a MAC address that is not that of the physical interface on the xenhost. Taking a look at the xenhost at a time when nagios was reporting that it was down, I found these entries in the arp table: nagios ether 00:16:3E:0C:DC:AC C xenbr0 nagios ether 00:16:3E:0C:DC:AC C eth0 I deleted the entry on xenbr0 by doing `arp -i xenbr0 -d nagios`, and immediately nagios was able to ping the host again. So, something is a little wonky here, but I don't know what... To make things stranger, I have a number of machines that are all running the same configuration. Only the machines that were rebooted yesterday morning are showing this issue. The configuration that I'm working with is: - Opensuse 10.3 - Xen 3.1.0_15042-51.3 installed from opensuse-packaged RPMs - Two bridges (xenbr0 and xenbr1), created with a custom network-script that does "/etc/xen/scripts/network-bridge start vifnum=0 bridge=xenbr0 netdev=eth0 && /etc/xen/scripts/network-bridge start vifnum=1 bridge=xenbr1 netdev=eth1" On a machine that is having this problem, `ip addr` shows this: [10:25:13] marlier@xenhost9:~> ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: peth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff inet6 fe80::fcff:ffff:feff:ffff/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:e0:81:40:2a:af brd ff:ff:ff:ff:ff:ff inet 192.168.xx.229/24 brd 192.168.xx.255 scope global eth1 inet6 fe80::2e0:81ff:fe40:2aaf/64 scope link valid_lft forever preferred_lft forever 4: vif0.0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff inet6 fe80::fcff:ffff:feff:ffff/64 scope link valid_lft forever preferred_lft forever 5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether 00:e0:81:40:2a:ae brd ff:ff:ff:ff:ff:ff inet 192.168.xx.80/24 brd 192.168.xx.255 scope global eth0 inet 192.168.xx.229/24 brd 192.168.xx.255 scope global eth0:2 inet6 fe80::2e0:81ff:fe40:2aae/64 scope link valid_lft forever preferred_lft forever 6: vif0.1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 7: veth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 8: vif0.2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 9: veth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 10: vif0.3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 11: veth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 12: xenbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff inet6 fe80::200:ff:fe00:0/64 scope link valid_lft forever preferred_lft forever 13: xenbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 14: vif1.0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 32 link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff inet6 fe80::fcff:ffff:feff:ffff/64 scope link valid_lft forever preferred_lft forever [10:25:16] marlier@xenhost9:~> On another machine that is _not_ having this issue (and which was not rebooted yesterday), and that also has an identical configuration in terms of scripts, versions, base OS, and so on, "ip addr" shows this: [10:33:02] marlier@xenhost2:~> ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: peth0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff inet6 fe80::fcff:ffff:feff:ffff/64 scope link valid_lft forever preferred_lft forever 3: peth1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff inet6 fe80::fcff:ffff:feff:ffff/64 scope link valid_lft forever preferred_lft forever 4: vif0.0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff inet6 fe80::fcff:ffff:feff:ffff/64 scope link valid_lft forever preferred_lft forever 5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether 00:e0:81:45:82:bc brd ff:ff:ff:ff:ff:ff inet 192.168.xx.86/24 brd 192.168.xx.255 scope global eth0 inet 192.168.xx.222/24 brd 192.168.xx.255 scope global eth0:2 inet6 fe80::2e0:81ff:fe45:82bc/64 scope link valid_lft forever preferred_lft forever 6: vif0.1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff inet6 fe80::fcff:ffff:feff:ffff/64 scope link valid_lft forever preferred_lft forever 7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether 00:e0:81:45:82:bd brd ff:ff:ff:ff:ff:ff inet 192.168.xx.222/24 brd 192.168.xx.255 scope global eth1 inet6 fe80::2e0:81ff:fe45:82bd/64 scope link valid_lft forever preferred_lft forever 8: vif0.2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 9: veth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 10: vif0.3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 11: veth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 14: xenbr0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff 15: xenbr1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff [10:33:08] marlier@xenhost2:~> I see those NOARP's in there, and I wonder if that might be the difference (possibly?)...but the two machines are using the same scripts to create the bridges, so why would they result in different configurations? And if that is the issue, is there a way to force the bridge to be created with the NOARP flag in there? _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |