[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance

To: xen-users@xxxxxxxxxxxxx
From: Kuba <kuba.0000@xxxxx>
Date: Sat, 05 Jul 2014 00:45:31 +0200
Delivery-date: Fri, 04 Jul 2014 22:46:52 +0000
List-id: Xen user discussion <xen-users.lists.xen.org>

W dniu 2014-07-04 19:11, lee pisze:

Kuba <kuba.0000@xxxxx> writes:

W dniu 2014-07-03 00:45, lee pisze:

It's tempting to try it out, and I really like the checksumming
it does, and it's also confusing: There's (at least) ZFS and OpenZFS,
and Debian requires you to use fuse if you want ZFS, adding more
complexity.


You haven't done your research thoroughly enough.


No, I haven't looked into it thoroughly at all.

On Linux there is for all intents and purposes one implementation.


Where is this implementation?  Is it available by default?  I only saw
that there's a Debian package for ZFS which involves fuse.


In case you'd like to try it out, follow these steps:
http://zfsonlinux.org/debian.html

and just have few minutes of fun. I'm pretty sure a livecd will
do. You can also use files instead of real disks.


Thanks!  Sooner or later I'll try it out.  How come there are no
packages in the Debian repos other than the fuse package?


Sorry, that's way beyond my knowledge.

A very long time ago, I lost data with xfs once.  It probably was my own
fault, using some mount parameters wrongly.  That taught me to be very
careful with file system and to prefer file systems that are easy to
use, that don't have many or any parameters that need to be considered
and basically just do what they are supposed to right out of the box.

Does ZFS do that?  Since it's about keeping the data safe, it might have
a good deal of protection against user errors.


Destructive operations are usually called accordingly: zfs destroy,
zfs rollback, so they quite clearly express the intention.


"Rollback" doesn't sound very destructive.

For me "rollback" always meant "revert to some previous state" and forme it sounds very destructive - at least for the "current state" fromwhich you are reverting.

How can a file system protect you from executing a destructive
operation?


It can try by warning you.

Does "rm" sound destructive or try to warn you? It just does what youtell it to do.

Snapshots protect you from most user errors. Off-site backups protect
you from su errors. To some extent.


Off-site would be good, but it's a hassle because I'd have to carry the
disks back and forth.

You can do that over the network. And it's always pros vs cons. It'syour data, your requirements, your decisions and your responsibility.

And how are snapshots better than copying the
data?

Snapshots are just snapshots, making them does not copy your data (well,in fact, ZFS is a COW file system, so making a snapshot may result inactually copying your data later on, if it's needed, but it's notcopying as in "making a backup"). Replicating a snapshot results increation of another dataset identical to the original snapshot. It'sjust a one more way of making full or incremental backups.

What if I need to access a file that's in the snapshot:  Do I
need to restore the snapshot first?

Usually you can "cd .zfs" directory, which contains subdirectories namedafter your snapshots, and inside that directories you have completedatasets just like the ones you took the snapshots of. Norollback/restoring/mounting is necessary.

It seems that ZFS isn't sufficiently mature yet to use it.  I haven't
learned much about it yet, but that's my impression so far.


As I said above - you haven't done your research very thoroughly.


I haven't, yet all I've been reading so far makes me very careful.  When
you search for "zfs linux mature", you find more sources saying
something like "it is not really mature" and not many, if any, that
would say something like "of course you should use it, it works
perfectly".


"Mature" means different things to different people in different
circumstances. Is Linux mature? Is Linux 3.15 mature? If not, is 2.6
mature? Does it mean it has no bugs? If ZoL is not mature enough for
you, you can use FreeBSD or Solaris. Or you can use hardware RAID +
any other FS. I have the same feeling about ZFS as Gordan - once you
start using it, you cannot imagine making do without it.


Why exactly is that?  Are you modifying your storage system all the time
or making snapshots all the time?

Yes, I take snapshots all the time. This way it's easy for me to revertVMs to previous states, clone them, etc. Same goes with my regular data.And I replicate them a lot.

Checksumming is sure good to have, being able to fully use the disk
caches is, too, as well as not wasting space through fixed block sizes.
I've never made a snapshot and don't know what I would make one for
other than perhaps making a snapshot of the dom0 and the VMs --- which
would require booting from ZFS, figuring out how to make snapshots and
where to put them and how to restore them.

The biggest advantage would be checksumming.  I'd be trading that
against ease of use and great complexity.  So you can see how it is not
understandable to me what makes ZFS so great that I wouldn't be able to
do without anymore.

I'm not saying you will feel about ZFS as I do after you try it out. Itpresents you a certain set of features, advantages and disadvantages andit is up to you, and you only, to decide whether you can benefit from itor not. All I'm saying is that I personally believe ZFS is worth takinginto consideration.

Does it mean you have to use it too? Of course not:) Is it wrong not
to use it? Of course not! You should do what _you_ believe is the
right thing to do. But try it out nonetheless :) Or try HAMMER
(Dragonfly BSD). Or btrfs (although this one probably really is not
mature enough).


Btrfs still needs some time, and it seems to have disadvantages compared
to ZFS (which may not even be relevant for what I'm doing).  I never
tried BSD; that would be something new to learn.

Anyway, I'll try it out.  That doesn't mean I'll jump to it right away,
especially not while I still can't tell whether the server finally runs
stable or not.  Give it some time without crashing.  I don't even know
if the disks would work as JBOD.

So you would be running ZFS on unreliable disks, with the errors being
corrected and going unnoticed, until either, without TLER, the system
goes down or, with TLER, until the errors aren't recoverable anymore and
become noticeable only when it's too late.


ZFS tells you it had problems ("zpool status"). ZFS can also check
entire pool for defects ("zpool scrub", you should do that
periodically).


You're silently loosing more and more redundancy.


I'm not sure what you mean by loosing redundancy.

How do you know when
a disk needs to be replaced?

ZFS tells you it had IO or checksum failures. It may also put your poolinto a degraded state (with one or more disks disconnected from thepool) with reduced redundancy (just like a regular RAID would do). SMARTalso tells you something wrong has happened (or is going to, probably).And, additionally, when you replace a disk and resilver (ZFS term forrebuilding) the pool, you know whether all your data was read andrestored without errors.

Does ZFS maintain a list of bad sectors which are not to be used again?

Don't know, but never heard of it. I always thought it's the storagedevice's job. Does any file system do that?

It's also quite difficult to corrupts the file system
itself:
https://blogs.oracle.com/timc/entry/demonstrating_zfs_self_healing


It shows that there are more checksum errors after the errors were
supposedly corrected.

Not "supposedly". The increasing number only shows the count ofencountered checksum errors. If ZFS could not correct the error, itwould say so.

Using ZFS does not mean you don't have to do backups. File system type
won't make a difference for a fire inside your enclosure:) But ZFS
makes it easy to create backups by replicating your pool or datasets
("zfs send" lets you create full or incremental backups) to another
set of disks or machine(s).


As another ZFS or as files or archives or as what?  I'm using rsync now,
and restoring a file is as simple as copying it from the backup.

Typically as another ZFS dataset. Replicating ZFS snapshots has one bigadvantage for me (besides checksumming, so you know you've made yourbackup correctly) - it's atomic, so it either happens or not. It doesn'tmean it's supposed to replace rsync, though. It depends on the task at hand.

http://blog.backblaze.com/2014/01/21/what-hard-drive-should-i-buy/


Those guys don't use ZFS.  They must have very good reasons not to.


They do:
http://www.youtube.com/watch?v=c5ASf53v4lI
http://zfsonlinux.org/docs/LUG11_ZFS_on_Linux_for_Lustre.pdf

And I believe they have lots of good reasons to do so :)


That's some laboratory experimenting with ZFS.  Backblaze uses ext4,
though ZFS would seem to be a very good choice for what they're doing.
How can they store so much data without checksumming, without using ECC
RAM and not experience a significant amount of data corruption?


That's what I found about Backblaze and ZFS (22-07-2011):

We are intrigued by it, as it would replace RAID & LVM as well. Butnative ZFS is not available on Linux and we're not looking to switch toOpenSolaris or FreeBSD, as our current system works great for us. Forsomeone starting from scratch, ZFS on one of these OSes might work andwe would be interested to know if someone tries it. We're more likely toswitch to btrfs in the future if anything.


http://www.smallnetbuilder.com/nas/nas-features/31541-how-to-build-a-cheap-petabyte-server-revisited

That's just two organizations with similarly sized storage and differentapproaches. One uses standard solutions, the other one ported ZFS toLinux, so they could use it. It's up to you to define your goals,solutions and level of assurance. My personal approach is "hope for thebest, plan for the worst".

The corruption wouldn't go unnoticed because they won't be able to
decrypt the data. They'd have to store everything at least twice, and
if they could cut their costs in half or less by not having to do that
through simply using ZFS, why wouldn't they?

Data redundancy is not implied by ZFS itself. You either want redundancyor not, ZFS is just one way of providing it.

What is the actual rate of data corruption or loss prevented or
corrected by ZFS due to its checksumming in daily usage?

I have experienced data corruption due to hardware failures in the past.Once is often enough for me and it happened more then once. If I hadn'tdone the checksumming myself, I probably wouldn't even have known aboutit. Since I started using it, ZFS detected data corruption several timesfor me (within a few years). But I don't own a data center :) Actualerror rates might depend on your workload, hardware, probabilities andlots of other things. Here's something you might find interesting:

http://www.zdnet.com/blog/storage/dram-error-rates-nightmare-on-dimm-street/638


Kuba


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

Follow-Ups:
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: Gordan Bobic
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: lee

References:
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: lee
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: Gordan Bobic
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: lee
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: Kuba
- Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
  - From: lee

Prev by Date: Re: [Xen-users] [Xen-devel] ARM: "xen_add_mach_to_phys_entry: cannot add ... already exists and panics"
Next by Date: Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
Previous by thread: Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
Next by thread: Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.