[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance


  • To: xen-users@xxxxxxxxxxxxx
  • From: Gordan Bobic <gordan@xxxxxxxxxx>
  • Date: Sat, 05 Jul 2014 10:57:06 +0100
  • Delivery-date: Sat, 05 Jul 2014 09:58:47 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 07/04/2014 06:11 PM, lee wrote:
Kuba <kuba.0000@xxxxx> writes:

W dniu 2014-07-03 00:45, lee pisze:

It's tempting to try it out, and I really like the checksumming
it does, and it's also confusing: There's (at least) ZFS and OpenZFS,
and Debian requires you to use fuse if you want ZFS, adding more
complexity.

You haven't done your research thoroughly enough.

No, I haven't looked into it thoroughly at all.

On Linux there is for all intents and purposes one implementation.

Where is this implementation?  Is it available by default?  I only saw
that there's a Debian package for ZFS which involves fuse.


In case you'd like to try it out, follow these steps:
http://zfsonlinux.org/debian.html

and just have few minutes of fun. I'm pretty sure a livecd will
do. You can also use files instead of real disks.

Thanks!  Sooner or later I'll try it out.  How come there are no
packages in the Debian repos other than the fuse package?

Is this some kind of Debian/Ubuntu brain damage that demands that everything be pre-chewed and served on a plate via the distro attached repositories? That's a very solipsistic view.

A very long time ago, I lost data with xfs once.  It probably was my own
fault, using some mount parameters wrongly.  That taught me to be very
careful with file system and to prefer file systems that are easy to
use, that don't have many or any parameters that need to be considered
and basically just do what they are supposed to right out of the box.

Does ZFS do that?  Since it's about keeping the data safe, it might have
a good deal of protection against user errors.

Destructive operations are usually called accordingly: zfs destroy,
zfs rollback, so they quite clearly express the intention.

"Rollback" doesn't sound very destructive.

How can a file system protect you from executing a destructive
operation?

It can try by warning you.

You could argue the same about rm. It's not the *nix way. If you want hand holding, use a more handholding OS.

Snapshots protect you from most user errors. Off-site backups protect
you from su errors. To some extent.

Off-site would be good, but it's a hassle because I'd have to carry the
disks back and forth.  And how are snapshots better than copying the
data?  What if I need to access a file that's in the snapshot:  Do I
need to restore the snapshot first?

Go read how zfs send | receive with incremental snapshots work. This is getting way off topic for the Xen mailing list.

It seems that ZFS isn't sufficiently mature yet to use it.  I haven't
learned much about it yet, but that's my impression so far.

As I said above - you haven't done your research very thoroughly.

I haven't, yet all I've been reading so far makes me very careful.  When
you search for "zfs linux mature", you find more sources saying
something like "it is not really mature" and not many, if any, that
would say something like "of course you should use it, it works
perfectly".

"Mature" means different things to different people in different
circumstances. Is Linux mature? Is Linux 3.15 mature? If not, is 2.6
mature? Does it mean it has no bugs? If ZoL is not mature enough for
you, you can use FreeBSD or Solaris. Or you can use hardware RAID +
any other FS. I have the same feeling about ZFS as Gordan - once you
start using it, you cannot imagine making do without it.

Why exactly is that?  Are you modifying your storage system all the time
or making snapshots all the time?

Since snapshots in ZFS are "free" in terms of performance, they are much more useful for everyday use. They also make incremental backups easier because you can use send/receive commands to transfer incrementally only the delta between the snapshots. Between that and extra integrity-preserving features it makes reaching for backups much less frequent.

Checksumming is sure good to have, being able to fully use the disk
caches is, too, as well as not wasting space through fixed block sizes.

Fixed block sizes don't waste space on traditional RAID. Variable block sizes are a performance feature that allows ZFS to work around the parity RAID problem of performance dropping down to 50% of performance of a single disk on partial stripe writes.

I've never made a snapshot and don't know what I would make one for
other than perhaps making a snapshot of the dom0 and the VMs --- which
would require booting from ZFS, figuring out how to make snapshots and
where to put them and how to restore them.

It sounds like your FS usage isn't advanced enough.

The biggest advantage would be checksumming.  I'd be trading that
against ease of use and great complexity.

Not to mention resistance to learning something new.

So you can see how it is not
understandable to me what makes ZFS so great that I wouldn't be able to
do without anymore.

Then don't use it.

So you would be running ZFS on unreliable disks, with the errors being
corrected and going unnoticed, until either, without TLER, the system
goes down or, with TLER, until the errors aren't recoverable anymore and
become noticeable only when it's too late.

ZFS tells you it had problems ("zpool status"). ZFS can also check
entire pool for defects ("zpool scrub", you should do that
periodically).

You're silently loosing more and more redundancy.  How do you know when
a disk needs to be replaced?

Same way you know with any disk failure - appropriate monitoring. Surely that is obvious.

Does ZFS maintain a list of bad sectors which are not to be used again?

By that fact you are asking this question, I dare say you need to go and read up more on how modern disks work. Modern disks manage their defects themselves. When a sector fails and cannot be read, they return an error on the read, and mark the sector as pending. Next time that sector is written, they will write it to one of the spare, hidden sectors, and map the LBA for the failed sector to the new sector. There has been no need for the file system to keep track of physical disk defects in decades.

It's also quite difficult to corrupts the file system
itself:
https://blogs.oracle.com/timc/entry/demonstrating_zfs_self_healing

It shows that there are more checksum errors after the errors were
supposedly corrected.

Not all errors were caught by the first operation. Import only chacks the pool metadata. The find command used cats all the files. This was used to demonstrate that the corrupted data will get silently repaired. If you want the most thorough, full check of all the data and metadata you use the zfs scrub command (which should be run reasonably regularly, appropriate to the pool size and scrub times).

Using ZFS does not mean you don't have to do backups. File system type
won't make a difference for a fire inside your enclosure:) But ZFS
makes it easy to create backups by replicating your pool or datasets
("zfs send" lets you create full or incremental backups) to another
set of disks or machine(s).

As another ZFS or as files or archives or as what?  I'm using rsync now,
and restoring a file is as simple as copying it from the backup.

zfs send produces a data stream that can be applied to another pool using zfs receive. You can pipe this over ssh or netcat to a different machine, or you can pipe it to a different pool locally.

http://blog.backblaze.com/2014/01/21/what-hard-drive-should-i-buy/

Those guys don't use ZFS.  They must have very good reasons not to.

They do:
http://www.youtube.com/watch?v=c5ASf53v4lI
http://zfsonlinux.org/docs/LUG11_ZFS_on_Linux_for_Lustre.pdf

And I believe they have lots of good reasons to do so :)

That's some laboratory experimenting with ZFS.  Backblaze uses ext4,
though ZFS would seem to be a very good choice for what they're doing.
How can they store so much data without checksumming, without using ECC
RAM and not experience a significant amount of data corruption?

You are asking the wrong question - how would they know if they are experiencing data corruption? The vast majority of backups are write-only. If 4KB of data (one sector) goes bad for every 10TB read, if only 1% of the backups ever need to get retrieved, that's one detected broken file over 1 petabyte of data stored.

What is the actual rate of data corruption or loss prevented or
corrected by ZFS due to its checksumming in daily usage?

According to disk manufacturers' own specifications for their own disks (i.e. assume it's worse), one unrecoverable error in 10^14 bits read. This doesn't include complete disk failures.


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.