[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] domU HDD accesses fail under load

  • To: <xen-users@xxxxxxxxxxxxx>
  • From: GigaTux <info@xxxxxxxxxxx>
  • Date: Thu, 15 Mar 2012 09:14:38 +0000
  • Delivery-date: Thu, 15 Mar 2012 09:15:50 +0000
  • Domainkey-signature: a=rsa-sha1; c=simple; d=gigatux.com; h=mime-version :content-type:content-transfer-encoding:date:from:to:subject :message-id; q=dns; s=postfix; b=fpBvYqyybhP5ym0a+yZuINYeAfnijes FAYs+rcjjS+ZKO5UJS7hFMPUlvul6l3OSayusHV1kMspvhdCXM7VfHahyiGqyx8t fO7Uc74MjJetko7ylNGY4E8rICcOPJoofktOB9XqnLP2qtApi7L8iNtHrZlpfS/t fDx9DTf8g5zY=
  • List-id: Xen user discussion <xen-users.lists.xen.org>


I've got a very peculiar issue that doesn't seem to be resolved via a variety of hardware and software changes.

Basically, only under loads (many domUs doing lots of activity), all domUs stop any kind of disk activity. Also, it seems like long running processes in the dom0 also stop disk activity (for instance, a md RAID rebuild). New disk activity in the dom0s is fine (e.g. a large 'dd'). All domUs essentially stop working because they can't write to disk and I also cannot create any new domUs (Device 51712 (vbd) could not be connected. Hotplug scripts not working.).

I am able to reliably reproduce this on a server. It usually takes a couple of hours to reproduce and my test setup involves about 40 domUs and a shell script to reboot them periodically.

All hardware has been replaced including motherboards from two different manufacturers (Asus and Supermicro), different RAM, CPUs, PSUs and even cases. The software has been replaced entirely as well, although the domU disk images themselves are the same. I have tried the following software:

Xen: 3.2, 4.0. 4.1 (Debian packaged), 4.1.3-rc1-pre (compiled myself yesterday from the latest 4.1 unstable branch) Kernels: 2.6.32 (SLES, oldstyle), 2.6.32 (Debian, oldstyle), 3.0.1 (SLES, oldstyle), 3.1 (Debian, pv_ops)
Userspace: Debian Lenny, Debian Squeeze

I'm basically at a loss now but figured that, as I'm able to reproduce this with relative ease, someone might want to look at it to see if there is a subtle underlying Xen issue here stopping disk access. There's unfortunately nothing really of interest in dmesg at the time the error occurs.

Let me know if you need any system information or even if trusted users want to go in and take a look. The server is currently in this funny state so I'll leave it there for a while if anyone wants to investigate it. I can provide whatever information's necessary, run a custom Xen build, turn on debugging or whatever.


GigaTux Customer Service

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.