[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Re: Guest CentOS 5.4 64bit on-top XCP 0.5 issue / HP ProLiant BL460c G6

Hi Marco,

I hope you've already solved this issue to your satisfaction, but I thought
I'd post just in case. There's an issue that affects people who:
 - Are running HP or Fujitsu servers with a hardware RAID
 - Are running CentOS/RHEL Xen domUs on CentOS/RHEL Xen dom0s
 - Are using kernels 2.6.18-194.x or greater (and really, who isn't?)
... it's currently affecting me and I think it's the one affecting you.

Please see these bug reports for RedHat and CentOS for some background
 - You should update your firmware if you haven't already, though that will
not solve the problem on its own.
 - You should ensure that your battery is charging correctly. 
 - You should switch your scheduler, on both the dom0 and the domU, to noop.
You can do this by adding "elevator=noop" to your kernel line in
/etc/grub.conf and restarting. 

In my case, I also have a blade (g5 instead of g6), and my stack trace harps
on fsync issues rather than pdflush issues, but I suspect you're
experiencing more or less the same issue. I'm currently on CentOS 5.6 and
2.6.18-238.9.1.el5xen, but I also see this issue on CentOS 5.5 and kernels
in the -194, -233, and earlier -238 ranges. I see it with Xen 3.0.3
(CentOS's version), 3.4.3, and 4.1. (http://www.gitco.de/repo/)

You're experiencing the issue right away, on boot, but if upgrading the
firmware and changing the scheduler fixes the boot issue, I would encourage
you to nevertheless run some tests in the guest domU to ensure that you're
okay during times of heavy disk access. I've been using dd to write a gb to
disk to test:
$ dd if=/dev/zero of=./test1024M bs=1024k count=1024 conv=fsync 
I found that before upgrading the firmware and changing the scheduler, this
would reliably make dmesg explode with "blocked for more than 120 seconds"
messages, and the write speed could be as low as 353 kB/s. Writing anything
less than 1GB did not as reliably cause issues.

Since making these changes, I still sometimes see issues with this heavy
test, still sometimes see a single "blocked for 120 seconds" message. The
write speed can be as low as 2MB/sec, but is generally between closer
50MB/sec. So I certainly don't have the answer, but these changes have made
a very material difference. 

View this message in context: 
Sent from the Xen - User mailing list archive at Nabble.com.

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.