[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance


  • To: xen-users@xxxxxxxxxxxxx
  • From: Gordan Bobic <gordan@xxxxxxxxxx>
  • Date: Sun, 06 Jul 2014 17:30:54 +0100
  • Delivery-date: Sun, 06 Jul 2014 16:31:42 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 07/06/2014 11:11 AM, lee wrote:

Does ZFS do that?  Since it's about keeping the data safe, it might have
a good deal of protection against user errors.

I don't think it's possible to guard against user errors. If you're
concerned about user errors, get someone else to manage your machines
and not give you the root password.

It is possible to guard.  It's not possible to prevent them.

Sacrificing productivity for hand holding  is not the *nix
paradigm. It's competence of bust. I for one don't want every command
I type to ask "are you sure" before it does what I told it to. All it
achieves it desensitizes you to the question and you end up saying y
automatically after a while, without considering what it even said.

All the commands you issue are such that they destroy whole file
systems?

Of course not, but there are few if any commands that have the ability to destroy a FS which ask for confirmation before doing so.

You don't have to rebuild a pool. The existing pool is modified in
place and that usually takes a few seconds. Typically the pool version
headers get a bump, and from there on ZFS knows it can put additional
metadata in place.

Similar happens when you toggle deduplication on a pool. It puts the
deduplication hash table headers in place. Even you remove the volume
that has been deduplicated and don't have any deduplicated blocks
afterwards, the headers will remain in place. But that doesn't break
anything and it doesn't require rebuilding of a pool.

Then it should be easy to turn features off later.

You can, but for example disabling compression on a compressed pool
won't decompress all the data. It will only make sure the data written
from that point on isn't compressed. If you want to actually
decompress the data, you'll have to copy it to an uncompressed file
system on the same pool, then destroy the old, compressed file system.

Why isn't there a command to uncompress the compressed data?

Why would there be? If you want to uncompress it, copy the files to a new directory, remove the original directory, then rename the new directory. You could probably write a one line script to do that for you if it's such a big problem.

systems. ZFS is the only file system I have used for the data on which
I never had to reach for backups.

One of the reasons for this may very well be that you know ZFS so well.
I don't know it at all.

I knew other file systems at least as well if not better, yet it didn't help.

We probably have a very different understanding or use of file systems.
I have some data I need stored, so I create a file system (since I can't
very well store the data without one) and store data on it, and that's
all.  I'm not "using" the file system for anything more than that.  The
simpler and the more reliably that works, the better.
>
You create a file system and use it for making snapshots, adding
features and whatever other complicated things there are, besides
storing some data.  You're using an exotic file system which isn't
widely supported on Linux and run into bugs and problems which you were
lucky to be able to recover from by talking to developers and
supporters.
>
If I was to switch to this file system, I'd much more likely than you
make user errors because I don't know this complicated file system, and
I might run into problems or bugs I might not be able to recover from
because I don't have access to the developers or supporters.  The
unknown file system would have the advantage that it could prevent
silent data corruption, which is a problem I haven't noticed yet.  Such
a switch isn't very appealing, as great as this file system might be.

And without a file system that detects said corruption for you, you will never notice it either.

Perhaps nothing of what you're saying about ZFS is true ;)

OK, you got me - I confess: it's all a part of my hidden agenda to
waste my time debating the issue with someone who hasn't progressed
beyond using software that isn't in their distribution's package
repository.

If you think I haven't, it must be true.

You are the one that implied that not having the package in the distribution repository was such a big deal.

At how many errors do you need to replace a disk?

Depends on the disk.
One of my Seagates rports the following line in SMART:

   5 Reallocated_Sector_Ct   0x0033   063   063   036    Pre-fail
Always       -       1549

So it has 1549 reallocated sectors at the moment. The stats start at
100, and the threshold for the disk needing to be replaced is
36. AFAIK, these are percentages. So, 1549 sectors = 100-63=37% of
spare sectors used. That would imply that this particular disk has
approximately 4100 spare sectors, and that it should be replaced when
the number of them remaining falls below 36%.

Are sectors that had errors being re-used,

You'll have to ask your disk manufacturer that. WD and Samsung
might. Or they could just be lying about the number of reallocated
sectors they have. WD and Samsung drives seem to have a feature where
a pending sector doesn't convert into a reallocated sector, which
implies either the reallocation count is lying the the previously
failed sectors are being re-used if the data sticks to them within the
limit of ECC's ability to recover.

So the smart numbers don't really give you an answer, unless the disk
manufacturer told you exactly what's actually going on.

SMART numbers SHOULD give you the answer, unless the manufacturer has deliberately made the firmware lie about it in the interest of reducing warranty return rates.

I was thinking of errors detected by ZFS, though.  What if you see a
few:  Do you replace the disk, or do you wait until there are so many?

Depends on the rate at which they are showing up. If every week the same disk throws a few errors, then yes, it is a good candidate for replacing. But usually there are other indications in SMART and syslog, e.g. command timeouts, bus resets, and similar.

or does ZFS keep a list of sectors not to use anymore?

As I said before, disk's handle their own defects, and have done for
the past decade or two. File systems have long had no place in keeping
track of duff sectors on disks.

So ZFS may silently loose redundancy (and in a bad case data), depending
on what the disks do.  And there isn't any way around that, other than
increasing redundancy.

How do you define "silently"? How would you detect disk failure with any traditional (hardware or software) RAID arrangement? You have to configure some kind of monitoring, appropriate to your system. ZFS is no different.



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.