[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance


  • To: xen-users@xxxxxxxxxxxxx
  • From: Gordan Bobic <gordan@xxxxxxxxxx>
  • Date: Sat, 05 Jul 2014 11:36:20 +0100
  • Delivery-date: Sat, 05 Jul 2014 10:36:35 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 07/04/2014 07:03 PM, lee wrote:
Gordan Bobic <gordan@xxxxxxxxxx> writes:

On 07/02/2014 11:45 PM, lee wrote:
On Linux there is for all intents and purposes one implementation.

Where is this implementation?  Is it available by default?  I only saw
that there's a Debian package for ZFS which involves fuse.

http://lmgtfy.com/?q=zfs+linux&l=1

Funny --- how many findings do you get?  A couple million?

It's an "I'm feeling lucky" link, so the number of findings you get is 1, and it even forwards you straight to it.

'apt-cache search zfs' is *much* more relevant.

I think that by saying this you have just demonstrated that the rest of us that have participated in this thread have largely been wasting our time.

Does ZFS do that?  Since it's about keeping the data safe, it might have
a good deal of protection against user errors.

I don't think it's possible to guard against user errors. If you're
concerned about user errors, get someone else to manage your machines
and not give you the root password.

It is possible to guard.  It's not possible to prevent them.

Sacrificing productivity for hand holding is not the *nix paradigm. It's competence of bust. I for one don't want every command I type to ask "are you sure" before it does what I told it to. All it achieves it desensitizes you to the question and you end up saying y automatically after a while, without considering what it even said.

You don't have to rebuild a pool. The existing pool is modified in
place and that usually takes a few seconds. Typically the pool version
headers get a bump, and from there on ZFS knows it can put additional
metadata in place.

Similar happens when you toggle deduplication on a pool. It puts the
deduplication hash table headers in place. Even you remove the volume
that has been deduplicated and don't have any deduplicated blocks
afterwards, the headers will remain in place. But that doesn't break
anything and it doesn't require rebuilding of a pool.

Then it should be easy to turn features off later.

You can, but for example disabling compression on a compressed pool won't decompress all the data. It will only make sure the data written from that point on isn't compressed. If you want to actually decompress the data, you'll have to copy it to an uncompressed file system on the same pool, then destroy the old, compressed file system.

You might enable a new feature and find that it causes
problems, but you can't downgrade ...

You could have to _use_ a feature that causes problems just because
it's available. And features that broken are rare, and non-critical.

And what do I do then?  Rebuild the pool to somehow downgrade to a
previous version of ZFS?

I have never seen a failure like you describe. I once ran into a bug that made the pool unimportable when the scrub was in progress and the pool used both compression and deduplication. The only thing I had to do was add an option to the module that prevents scrub from auto-resuming, and it came up fine. Bug was fixed very shortly afterwards.

Every other data loss failure I have seen when using ZFS has been either due to user error (destroying the wrong pool, putting disks on a different RAID controller that clobbered the partition tables, user error causing massive memory corruption that ended up getting flushed out to disk and severely corrupting the ZFS metadata) or massive hardware failure beyond what the level of redundancy used when creating the pool could handle. And in the cases other than hardware failure, most of the time the data was recoverable with the help of the ZFS developers (or Oracle support - the user error leading to memory corruption was on a Solaris machine).

So yes, it can happen - but it happens a lot more on other file systems. ZFS is the only file system I have used for the data on which I never had to reach for backups.


It seems that ZFS isn't sufficiently mature yet to use it.  I haven't
learned much about it yet, but that's my impression so far.

As I said above - you haven't done your research very thoroughly.

I haven't, yet all I've been reading so far makes me very careful.  When
you search for "zfs linux mature", you find more sources saying
something like "it is not really mature" and not many, if any, that
would say something like "of course you should use it, it works
perfectly".

There's a lot of FUD out there, mostly coming from people who have
neither tried it nor know what they are talking about. Whatever next?
"It must be true because I read it on the internet"?

Perhaps nothing of what you're saying about ZFS is true ;)

OK, you got me - I confess: it's all a part of my hidden agenda to waste my time debating the issue with someone who hasn't progressed beyond using software that isn't in their distribution's package repository.

As far as I've seen, that doesn't happen.  Instead, the system goes
down, trying to access the unresponsive disk indefinitely.

I see a disk get kicked out all the time. Most recent occurrence was 2
days ago.

You seem to have a lot of disks failing.

I do, but it's slowing down dramatically as I'm running out of Seagates.

"zfs status" shows you the errors on each disk in the pool. This
should be monitored along with regular SMART checks. Using ZFS doesn't
mean you no longer have to monitor for hardware failure, any more than
you can not monitor for failure of a disk in a hardware RAID array.

At how many errors do you need to replace a disk?

Depends on the disk.
One of my Seagates rports the following line in SMART:

5 Reallocated_Sector_Ct 0x0033 063 063 036 Pre-fail Always - 1549

So it has 1549 reallocated sectors at the moment. The stats start at 100, and the threshold for the disk needing to be replaced is 36. AFAIK, these are percentages. So, 1549 sectors = 100-63=37% of spare sectors used. That would imply that this particular disk has approximately 4100 spare sectors, and that it should be replaced when the number of them remaining falls below 36%.

Are sectors that had errors being re-used,

You'll have to ask your disk manufacturer that. WD and Samsung might. Or they could just be lying about the number of reallocated sectors they have. WD and Samsung drives seem to have a feature where a pending sector doesn't convert into a reallocated sector, which implies either the reallocation count is lying the the previously failed sectors are being re-used if the data sticks to them within the limit of ECC's ability to recover.

or does ZFS keep a list of sectors not to use anymore?

As I said before, disk's handle their own defects, and have done for the past decade or two. File systems have long had no place in keeping track of duff sectors on disks.

Or how unreliable is a disk that spends significant amounts of time on
error correction?

Exactly - 7 seconds is about 840 read attempts. If the sector read
failed 840 times in a row, what are the chances that it will ever
succeed?

Isn't the disk supposed not to use the failed sector once it has been
discovered, meaning that the disk might still be useable?

When a sector becomes unreadable, it is marked as "pending". Rad
attempts from it will return an error. The next write to it will cause
it to get reallocated from the spare sectors the disk comes with. As
far as I can tell, some disks try to re-use the sector when a write
for it arrives, and see if the data sticks to the sector within the
ability of the sector's ECC to recover. If it sticks, it's kept, if it
doesn't, it's reallocated.

That would mean that a disk which has been failed due to error
correction taking too long may still be fine.

Yes. Most disks have some reallocated sectors after a while.



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.