Re: [Xen-users] domU network interface half-dies regularly

I had the same problem yesterday. One of the domU running on a server
had the same symptoms: TX counter stopped while the RX one was
increasing normally. 
I'm running Centos 5.3 with 2.6.18-92.1.22.el5xen on the dom0 and CentOS
5.2 with 2.6.18-92.1.22.el5xen on the domU.

Rebooting the domU solvs the problem, but it isn't an attractive


* Mariusz Mazur <mmazur@xxxxxxxxx> [03/15/2010 07:10]:
> I'm trying to figure out how to debug this. Any suggestions would be 
> appreciated.
> Every once in a while a random domU on a random xen server of ours has its 
> network interface die. I've recently figured out what the exact symptoms are: 
> TX count on that interface (as seen from inside the domU) stops increasing. 
> There's no way of actually sending anything from within the domU. Even arp 
> packets aren't sent. Everything works fine with receiving packets however.
> Of the things I did check:
> - Doing an ip set link down/up on both dom0/domU doesn't do anything. 
> - Removing/reattaching the dom0 interface from/to its bridge doesn't help.
> - It's interface-specific. I'm currently logged onto a domU that has one of 
> its net interfaces half-dead as described, but the other perfectly functional.
> - Interestingly, the problem prevents "xm save" from working. It timeouts 
> without anything getting written to disk (except a kilobyte or so of, I'm 
> guessing, some headers).
> - I'm seeing this problem across:
>   - 2.6.18 xen.org dom0 3.3.X and 3.4.X
>   - xen.org hypervisor 3.3.X and 3.4.X
>   - domU xen.org
>   - kernel.org (pvops)
>   - A few different machines from different vendors.
> - Nothing in dom0/domU kernel logs.
> Whatever the cause is, I seriously doubt it's domU's fault, considering I'm 
> seeing the problem on both xen.org and kernel.org domU kernels. I also don't 
> know what the trigger is (plus, those are production systems), so enabling a 
> bunch of DEBUG prints in xen isn't much of an option.
> Any suggestions/hints on where to look next? I'm guessing there are ways of 
> inspecting various network code structures.
