[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance

To: xen-users@xxxxxxxxxxxxx
From: Gordan Bobic <gordan@xxxxxxxxxx>
Date: Sat, 05 Jul 2014 11:36:20 +0100
Delivery-date: Sat, 05 Jul 2014 10:36:35 +0000
List-id: Xen user discussion <xen-users.lists.xen.org>

On 07/04/2014 07:03 PM, lee wrote:

Gordan Bobic <gordan@xxxxxxxxxx> writes:

On 07/02/2014 11:45 PM, lee wrote:

On Linux there is for all intents and purposes one implementation.


Where is this implementation?  Is it available by default?  I only saw
that there's a Debian package for ZFS which involves fuse.


http://lmgtfy.com/?q=zfs+linux&l=1


Funny --- how many findings do you get?  A couple million?

It's an "I'm feeling lucky" link, so the number of findings you get is1, and it even forwards you straight to it.

'apt-cache search zfs' is *much* more relevant.

I think that by saying this you have just demonstrated that the rest ofus that have participated in this thread have largely been wasting our time.

Does ZFS do that?  Since it's about keeping the data safe, it might have
a good deal of protection against user errors.


I don't think it's possible to guard against user errors. If you're
concerned about user errors, get someone else to manage your machines
and not give you the root password.


It is possible to guard.  It's not possible to prevent them.

Sacrificing productivity for hand holding is not the *nix paradigm.It's competence of bust. I for one don't want every command I type toask "are you sure" before it does what I told it to. All it achieves itdesensitizes you to the question and you end up saying y automaticallyafter a while, without considering what it even said.

You don't have to rebuild a pool. The existing pool is modified in
place and that usually takes a few seconds. Typically the pool version
headers get a bump, and from there on ZFS knows it can put additional
metadata in place.

Similar happens when you toggle deduplication on a pool. It puts the
deduplication hash table headers in place. Even you remove the volume
that has been deduplicated and don't have any deduplicated blocks
afterwards, the headers will remain in place. But that doesn't break
anything and it doesn't require rebuilding of a pool.


Then it should be easy to turn features off later.

You can, but for example disabling compression on a compressed poolwon't decompress all the data. It will only make sure the data writtenfrom that point on isn't compressed. If you want to actually decompressthe data, you'll have to copy it to an uncompressed file system on thesame pool, then destroy the old, compressed file system.

You might enable a new feature and find that it causes
problems, but you can't downgrade ...


You could have to _use_ a feature that causes problems just because
it's available. And features that broken are rare, and non-critical.


And what do I do then?  Rebuild the pool to somehow downgrade to a
previous version of ZFS?

I have never seen a failure like you describe. I once ran into a bugthat made the pool unimportable when the scrub was in progress and thepool used both compression and deduplication. The only thing I had to dowas add an option to the module that prevents scrub from auto-resuming,and it came up fine. Bug was fixed very shortly afterwards.

Every other data loss failure I have seen when using ZFS has been eitherdue to user error (destroying the wrong pool, putting disks on adifferent RAID controller that clobbered the partition tables, usererror causing massive memory corruption that ended up getting flushedout to disk and severely corrupting the ZFS metadata) or massivehardware failure beyond what the level of redundancy used when creatingthe pool could handle. And in the cases other than hardware failure,most of the time the data was recoverable with the help of the ZFSdevelopers (or Oracle support - the user error leading to memorycorruption was on a Solaris machine).

So yes, it can happen - but it happens a lot more on other file systems.ZFS is the only file system I have used for the data on which I neverhad to reach for backups.

It seems that ZFS isn't sufficiently mature yet to use it.  I haven't
learned much about it yet, but that's my impression so far.


As I said above - you haven't done your research very thoroughly.


I haven't, yet all I've been reading so far makes me very careful.  When
you search for "zfs linux mature", you find more sources saying
something like "it is not really mature" and not many, if any, that
would say something like "of course you should use it, it works
perfectly".


There's a lot of FUD out there, mostly coming from people who have
neither tried it nor know what they are talking about. Whatever next?
"It must be true because I read it on the internet"?


Perhaps nothing of what you're saying about ZFS is true ;)

OK, you got me - I confess: it's all a part of my hidden agenda to wastemy time debating the issue with someone who hasn't progressed beyondusing software that isn't in their distribution's package repository.

As far as I've seen, that doesn't happen.  Instead, the system goes
down, trying to access the unresponsive disk indefinitely.


I see a disk get kicked out all the time. Most recent occurrence was 2
days ago.


You seem to have a lot of disks failing.


I do, but it's slowing down dramatically as I'm running out of Seagates.

"zfs status" shows you the errors on each disk in the pool. This
should be monitored along with regular SMART checks. Using ZFS doesn't
mean you no longer have to monitor for hardware failure, any more than
you can not monitor for failure of a disk in a hardware RAID array.


At how many errors do you need to replace a disk?


Depends on the disk.
One of my Seagates rports the following line in SMART:

5 Reallocated_Sector_Ct 0x0033 063 063 036 Pre-failAlways - 1549

So it has 1549 reallocated sectors at the moment. The stats start at100, and the threshold for the disk needing to be replaced is 36. AFAIK,these are percentages. So, 1549 sectors = 100-63=37% of spare sectorsused. That would imply that this particular disk has approximately 4100spare sectors, and that it should be replaced when the number of themremaining falls below 36%.

Are sectors that had errors being re-used,

You'll have to ask your disk manufacturer that. WD and Samsung might. Orthey could just be lying about the number of reallocated sectors theyhave. WD and Samsung drives seem to have a feature where a pendingsector doesn't convert into a reallocated sector, which implies eitherthe reallocation count is lying the the previously failed sectors arebeing re-used if the data sticks to them within the limit of ECC'sability to recover.

or does ZFS keep a list of sectors not to use anymore?

As I said before, disk's handle their own defects, and have done for thepast decade or two. File systems have long had no place in keeping trackof duff sectors on disks.

Or how unreliable is a disk that spends significant amounts of time on
error correction?


Exactly - 7 seconds is about 840 read attempts. If the sector read
failed 840 times in a row, what are the chances that it will ever
succeed?


Isn't the disk supposed not to use the failed sector once it has been
discovered, meaning that the disk might still be useable?


When a sector becomes unreadable, it is marked as "pending". Rad
attempts from it will return an error. The next write to it will cause
it to get reallocated from the spare sectors the disk comes with. As
far as I can tell, some disks try to re-use the sector when a write
for it arrives, and see if the data sticks to the sector within the
ability of the sector's ECC to recover. If it sticks, it's kept, if it
doesn't, it's reallocated.


That would mean that a disk which has been failed due to error
correction taking too long may still be fine.


Yes. Most disks have some reallocated sectors after a while.



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

Follow-Ups:
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: lee

References:
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: lee
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: Gordan Bobic
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: lee
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: Gordan Bobic
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: lee

Prev by Date: Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
Next by Date: Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
Previous by thread: Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
Next by thread: Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.