[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] High Net and Disk Use == stuck domain

Christopher S. Aker wrote:
For the past year or so we've been seeing a bug whereby a domU's CPU would spin up to a steady 100, 200, 300 or 400% (4 vcpus), console would freeze, and some or all of the network-facing services within the domU would connect but block without any output. Disk IO would flatline. The domU would never recover and required rebooting.

Since pv_ops hasn't always been around, we previously had only seen this behavior with xen-patched domUs (2.6.18.x), but now we're seeing it with pv_ops. Identical symptoms. And, I have a user that is able to reliable reproduce it on!

His recipe is downloading an ISO from a very fast and close-by news server using nzbget. The trigger appears to be a combination of high network use and high disk use (like download from a very fast mirror) -- because we weren't able to reproduce the problem when saving to a tmpfs mount.

I was able to grab the output of sysrq t while it was in the bad state:


The number of processes in D state (39) is quite suspicious.

Let me know if there's anything else I can provide.



Did this one slip by you? I figured a reproducible bug would be just too tantalizing to resist.

What's the correct venue for these issues that overlap xen-devel, lkml, and virtualization/pv_ops stuff -- should I be blasting these to everybody?


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.