[Xen-users] i/o scheduler deadlocks with loopback devices

This was an email I sent to xen-devel a while ago without getting a response. I'm reposting it here in case someone knows more.

 Hello all,

I'm able to consistently reproduce lockups in my domU with heavy I/O
with the following error:

36841.420662] INFO: task rsyslogd:15014
blocked for more than 120 seconds. [36841.420843] "echo 0>
/proc/sys/kernel/hung_task_timeout_secs" disables this message.

The task varies between any of the tasks that might be active
(kjournald, loop0, etc.)

My setup is:
Xen dom0  version 3.4.2.
domU: Ubuntu 10.04, 2.6.36-rc6 based on Stefano Stabellini's
v2.6.36-rc6-urgent-fixes tree.
Paravirtual disks and network interfaces.
Root filesystem on /dev/xvda3, formatted ext3, mounted with default options.
Both dom0 and domU are using the CFQ i/o scheduler.

The xvbd is based on LVM, on top of a local SATA RAID array.

To produce this, I can do one of the following:

Set up domU as a primary drbd node, with my drbd volume on top of a
local loopback device, and then rsync many files to the volume, delete
them, and repeat until the crash.

Mount a linux iso via loopback on a /mnt/test, rsync /mnt/test/ to
another directory on xvda3, delete the files, and then repeat until the

This is very similar to the following situation:


Jeremy Fitzhardinge replied to that thread, indicating that his "xen:
use percpu interrupts for IPIs and VIRQs" and "xen: handle events as
edge-triggered" patches should fix the issue. These were introduced into
2.6.36-rc3, I believe, and the issue persists. Disabling irqbalanced in
dom0, as he suggested as a workaround, has no effect. I've also tried
changing the scheduler, and reducing the number of vcpus from 4 to 1,
which also had no effect.


Nathan Gamber

