[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] domU network interface half-dies regularly

I'm trying to figure out how to debug this. Any suggestions would be 

Every once in a while a random domU on a random xen server of ours has its 
network interface die. I've recently figured out what the exact symptoms are: 
TX count on that interface (as seen from inside the domU) stops increasing. 
There's no way of actually sending anything from within the domU. Even arp 
packets aren't sent. Everything works fine with receiving packets however.

Of the things I did check:
- Doing an ip set link down/up on both dom0/domU doesn't do anything. 
- Removing/reattaching the dom0 interface from/to its bridge doesn't help.
- It's interface-specific. I'm currently logged onto a domU that has one of 
its net interfaces half-dead as described, but the other perfectly functional.
- Interestingly, the problem prevents "xm save" from working. It timeouts 
without anything getting written to disk (except a kilobyte or so of, I'm 
guessing, some headers).
- I'm seeing this problem across:
  - 2.6.18 xen.org dom0 3.3.X and 3.4.X
  - xen.org hypervisor 3.3.X and 3.4.X
  - domU xen.org
  - kernel.org (pvops)
  - A few different machines from different vendors.
- Nothing in dom0/domU kernel logs.

Whatever the cause is, I seriously doubt it's domU's fault, considering I'm 
seeing the problem on both xen.org and kernel.org domU kernels. I also don't 
know what the trigger is (plus, those are production systems), so enabling a 
bunch of DEBUG prints in xen isn't much of an option.

Any suggestions/hints on where to look next? I'm guessing there are ways of 
inspecting various network code structures.


Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.