[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] how to start VMs in a particular order

On 30/06/14 17:11, Joost Roeleveld wrote:
On Sunday 29 June 2014 17:35:17 lee wrote:
"J. Roeleveld" <joost@xxxxxxxxxxxx> writes:
On Sunday, June 29, 2014 10:09:32 AM lee wrote:
Quite possibly. Am I correct in assuming you are using old hardware
closed-source software?
It's an IBM x3650 7979 L2G with a ServeRaid 8k.  Arcconf seems to be
closed source --- I don't really need arcconf, though.

Unfortunately, disabling the status checking hasn't solved the problem.
The server goes down with messages about the SCSI bus hanging and trying
to reset it.  I suspect that the controller doesn't like the --- rather
unsuited --- WD20EARS I plugged in.  They have been working fine with a
HP smart array P800, though.  I might have to take them out to see if
the problem persists.
SCSI bus hanging, sounds like an I/O issue.
Try to read the SMART-values of the disk.
I'm not sure how to do that, and what would they tell me?
Either connect the disks directly to a sata port on a mainboard (normal
desktop would suffice). Disabling the raid-functionality of the card might also
Then use (assuming the disk is /dev/sda)
# smartctl --all /dev/sda

Also, try a different disk...
Unfortunately, I don't have one I could try --- and I'd need three.

The WD20EARS is a "green" desktop disk. I had numerous issues when using a
couple of those in my old server when using software raid (mdadm).
Some hardware raid cards do not like disks that do not properly return
error- states. And especially the green disks that have a tendency to go
into powersave mode when not used for a short period of time.
I know, they aren't suited for this purpose.  Yet they have been working
fine on the P800, and that three disks should decide to go bad in a way
that blocks the controller (or whatever happens) every now and then
seems unlikely.
No, it doesn't.
Does the error occur after the server has been idle for a while? Or when the
disks are being stressed?

If the former, then you need to figure out how to AVOID the disks to enter
powersaving mode. It takes time for the disks to spin up again afterwards. The
raid controller is timing out on access to the disks.

If the latter, then you might have issues on the drives themselves which the
drives are trying to solve themselves.

My guess is that it is the former. (eg. when the server has been idle for a

BTW, this reminds me of the issue of these type of disks (consumer/non RAID) dropping out of linux MD raid arrays by themselves, even though they are perfectly good. For the cause, and solution, read up on SCT/ERC.

BTW, google for sct/erc and result 4 (for me) talks about WD releasing the red line for use in RAID. The short story is that the green disk will try (really hard) to get the data in a requested sector (because it is probably the only location the user has saved this important data), and ignore any instructions from the raid controller. Since this can take a few minutes, then the raid controller has decided the drive is dead (usually I read about this that Linux decides the disk is dead and marks it as failed for the raid array).

In Linux software raid, the solution is to tell the kernel to be more patient, and wait longer for the disk to respond, so it doesn't get kicked out of the array, with hardware raid, you may not have that option.

In RAID drives (like the red, or enterprise level), if a sector is not readable (quickly, about 6 seconds I think), then the drive will simply return an error, assuming that MD or hardware raid controller will simply read the bad sector from another disk in the array.

I hope this helps...

BTW, with current size drives, it is common/frequent to have the above issue. Sometimes even a single complete read of the drive can trigger it, see linux-raid mailing list for more information.


Adam Goryachev Website Managers www.websitemanagers.com.au

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.