[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Network Issues on Migration



On Fri, Jan 09, 2009 at 02:17:34PM -0500, Wendell Dingus wrote:
> 
> I've read and experimented extensively and being in desperate need of 
> "finishing" this setup and getting it deployed live, would like to see if 
> anyone has any suggestions on the last hangup we seem to have. 
> 
> Two SuperMicro 1U servers with dual quad-core CPUs and 16GB RAM each. CentOS 
> 5.2 x86_64 and it's xen implementation. The only thing non "stock" CentOS at 
> this point are the Intel IGB drivers. The RHEL/CentOS drivers for Intel IGB 
> appear to have a bug with DHCP over a bridged interface which the latest 
> drivers downloaded straight from Intel cured for us. 
> 
> Anyway, both are attached to shared FC storage and are doing RHCS with both 
> IP and disk-based quorum. CLVMD with a shared VG for creating LV's in as 
> containers for VMs. That part is all working very good. 
> 
> Each DOM0 has 2 physical NICs and both are bridged. Additionally we added a 
> virbr0 as a bridged per-DOM0 local network as well. 
> 
> When any VM boots up it can ping and traceroute on any of it's respective 
> networks perfectly. Inbound/outbound data flow of any kind appears perfect as 
> well. Once a VM is migrated or live-migrated to the other DOM0 though the 
> ability to ping or traceroute ceases. Sessions via ssh or httpd either 
> inbound or outbound continue to work fine though. 
> 
> When a VM boots I see this in dmesg: 
> netfront: Initialising virtual ethernet driver. 
> netfront: device eth0 has flipping receive path. 
> 
> I read something about a CRC problem and had each of them do "ethtool -K 
> eth{n} tx off" but don't think that was necessary in this instance, I've 
> never seen any error messages about CRC errors. The described problem and 
> solution I followed was not heavily detailed and it was just an attempt to 
> see if that helped with the problem. 
> 
> The following was added to the end of /etc/sysctl.conf on both DOM0's only 
> (per the excellent wiki article): 
> net.ipv4.icmp_echo_ignore_broadcasts = 1 
> net.ipv4.conf.all.accept_redirects = 0 
> net.ipv4.conf.all.send_redirects = 0 
> 
> The other oddity about this is that a VM started on server1 and live migrated 
> to server2, a running ping only pauses a short while then picks right back up 
> and continues to be successful. Migrating it back to server1 or initially 
> starting a VM on server2 and migrating it to server1 is where the ping 
> "stuck" issue comes into play. We were very careful and documented well as we 
> installed both boxes, in an attempt to keep them as identical as possible. I 
> fear this behavior proves that's not the case though, ugh... 
> 
> After migrating from 2 to 1 and then trying a ping (and waiting a good logn 
> while before ctrl-c'ing this): 
> PING 192.168.77.1 (192.168.77.1) 56(84) bytes of data. 
> 64 bytes from 192.168.77.1: icmp_seq=1 ttl=64 time=0.000 ms 
> 
> --- 192.168.77.1 ping statistics --- 
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms 
> rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms 
> 
> Very strange... Additionally a "service network restart" at this point 
> results in all interfaces going down, loopback being reinitialized and then 
> it hangs on trying to bring up eth0. I can ctrl-c it three times as it pauses 
> on each interface, then "ifconfig" and see all the IPs are still there. Still 
> can't ping but can "telnet google.com 80" for instance. Odd... 
> 
> So anyway, any pointers or suggestions you might have, would be greatly 
> appreciated... 
> 

https://www.redhat.com/archives/rhelv5-announce/2008-October/msg00000.html

Some entries from the RHEL 5.3 beta changelog:

+ Timer problems after migration were fixed
+ Lengthy network outage after migrations was fixed

Dunno if it's that what you're seeing.. 

-- Pasi

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.