[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] how to start VMs in a particular order

To: xen-users@xxxxxxxxxxxxx
From: Adam Goryachev <mailinglists@xxxxxxxxxxxxxxxxxxxxxx>
Date: Mon, 30 Jun 2014 17:55:46 +1000
Delivery-date: Mon, 30 Jun 2014 07:56:18 +0000
List-id: Xen user discussion <xen-users.lists.xen.org>

On 30/06/14 17:11, Joost Roeleveld wrote:

On Sunday 29 June 2014 17:35:17 lee wrote:

"J. Roeleveld" <joost@xxxxxxxxxxxx> writes:

On Sunday, June 29, 2014 10:09:32 AM lee wrote:

Quite possibly. Am I correct in assuming you are using old hardware
with
closed-source software?

It's an IBM x3650 7979 L2G with a ServeRaid 8k.  Arcconf seems to be
closed source --- I don't really need arcconf, though.

Unfortunately, disabling the status checking hasn't solved the problem.
The server goes down with messages about the SCSI bus hanging and trying
to reset it.  I suspect that the controller doesn't like the --- rather
unsuited --- WD20EARS I plugged in.  They have been working fine with a
HP smart array P800, though.  I might have to take them out to see if
the problem persists.

SCSI bus hanging, sounds like an I/O issue.
Try to read the SMART-values of the disk.

I'm not sure how to do that, and what would they tell me?

Either connect the disks directly to a sata port on a mainboard (normal
desktop would suffice). Disabling the raid-functionality of the card might also
suffice.
Then use (assuming the disk is /dev/sda)
# smartctl --all /dev/sda

Also, try a different disk...

Unfortunately, I don't have one I could try --- and I'd need three.

The WD20EARS is a "green" desktop disk. I had numerous issues when using a
couple of those in my old server when using software raid (mdadm).
Some hardware raid cards do not like disks that do not properly return
error- states. And especially the green disks that have a tendency to go
into powersave mode when not used for a short period of time.

I know, they aren't suited for this purpose.  Yet they have been working
fine on the P800, and that three disks should decide to go bad in a way
that blocks the controller (or whatever happens) every now and then
seems unlikely.

No, it doesn't.
Does the error occur after the server has been idle for a while? Or when the
disks are being stressed?

If the former, then you need to figure out how to AVOID the disks to enter
powersaving mode. It takes time for the disks to spin up again afterwards. The
raid controller is timing out on access to the disks.

If the latter, then you might have issues on the drives themselves which the
drives are trying to solve themselves.

My guess is that it is the former. (eg. when the server has been idle for a
while)

BTW, this reminds me of the issue of these type of disks (consumer/nonRAID) dropping out of linux MD raid arrays by themselves, even thoughthey are perfectly good. For the cause, and solution, read up on SCT/ERC.

BTW, google for sct/erc and result 4 (for me) talks about WD releasingthe red line for use in RAID. The short story is that the green diskwill try (really hard) to get the data in a requested sector (because itis probably the only location the user has saved this important data),and ignore any instructions from the raid controller. Since this cantake a few minutes, then the raid controller has decided the drive isdead (usually I read about this that Linux decides the disk is dead andmarks it as failed for the raid array).

In Linux software raid, the solution is to tell the kernel to be morepatient, and wait longer for the disk to respond, so it doesn't getkicked out of the array, with hardware raid, you may not have that option.

In RAID drives (like the red, or enterprise level), if a sector is notreadable (quickly, about 6 seconds I think), then the drive will simplyreturn an error, assuming that MD or hardware raid controller willsimply read the bad sector from another disk in the array.


I hope this helps...

BTW, with current size drives, it is common/frequent to have the aboveissue. Sometimes even a single complete read of the drive can triggerit, see linux-raid mailing list for more information.


Regards,
Adam

--
Adam Goryachev Website Managers www.websitemanagers.com.au

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

Follow-Ups:
- Re: [Xen-users] how to start VMs in a particular order
  - From: lee

References:
- [Xen-users] how to start VMs in a particular order
  - From: lee
- Re: [Xen-users] how to start VMs in a particular order
  - From: J. Roeleveld
- Re: [Xen-users] how to start VMs in a particular order
  - From: lee
- Re: [Xen-users] how to start VMs in a particular order
  - From: Joost Roeleveld

Prev by Date: Re: [Xen-users] EFI_VENDOR not set error
Next by Date: [Xen-users] BSOD after live migrate a windows 2003(32bit) with GPL PV driver installed
Previous by thread: Re: [Xen-users] how to start VMs in a particular order
Next by thread: Re: [Xen-users] how to start VMs in a particular order
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.