[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance


  • To: xen-users@xxxxxxxxxxxxx
  • From: Gordan Bobic <gordan@xxxxxxxxxx>
  • Date: Thu, 10 Jul 2014 07:23:15 +0100
  • Delivery-date: Thu, 10 Jul 2014 06:24:14 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 07/10/2014 03:34 AM, lee wrote:
Nuno MagalhÃes <nunomagalhaes@xxxxxxxxx> writes:

On Wed, Jul 9, 2014 at 3:29 AM, lee <lee@xxxxxxxxxxxxxxx> wrote:
It's simple with RAID because the disk either fails or not.  It's
usually simple with disks because it either fails or not.

Depends on your RAID, depends on your disks. If you use disks without
TLER they may get kicked out of the array even if they're not failing.

That brings back the question of when a disk has actually failed.

It also highlights that disk "failure" is often nowhere nearly as clear cut as most people realize.

Should I continue to entrust my data to a disk that spends long times on
trying to correct errors?

Any disk without TLER (Time Limited Error Recovery) and having it enabled will sometimes do that. You also need a RAID controller that either enables this feature on the disk itself or exposes the raw disk in a way that you can do it yourself, as the setting defaults to off. It wouldn't surprise me if various storage vendors ship disks with doctored firmwares that, among other things, have this setting defaulting to off. Some disk enclosures even go as far as detecting the disks that are plugged in, and if it recognizes the model number it'll flash the disk's firmware to the certified version (the enclosure has a built in library of certified firmwares for certified disk models). I only found out about this feature of some enclosures when a client got some disks that seemed legit but turned out to be "fake" (in the sense that they were desktop drives doctored to say they were enterprise drives). So when the enclosure flashed the certified enterprise firmware onto them they got bricked because the drive electronics were actually different).

Should I use it as a spare for a RAIDZ-1
volume and hope that it might be helpful?

With ZFS you might be able to get away with it by bumping the SCSI command timeout to however long the disk takes to time out the operation. That will prevent the disk getting kicked out of the pool, but you'll have to live with all pool operations blocking until that disk responds.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.