[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance



Gordan Bobic <gordan@xxxxxxxxxx> writes:

>> On 07/04/2014 06:11 PM, lee wrote:
>>
>> Thanks!  Sooner or later I'll try it out.  How come there are no
>> packages in the Debian repos other than the fuse package?
>
> Is this some kind of Debian/Ubuntu brain damage that demands that
> everything be pre-chewed and served on a plate via the distro attached
> repositories? That's a very solipsistic view.

If you think it's an indication of brain damage to prefer using software
that is included in the distribution you're using and to wonder why a
particular software isn't included, then it must be brain damage.

> This is getting way off topic for the Xen mailing list.

It is since quite a while.

>> Why exactly is that?  Are you modifying your storage system all the time
>> or making snapshots all the time?
>
> Since snapshots in ZFS are "free" in terms of performance, they are
> much more useful for everyday use. They also make incremental backups
> easier because you can use send/receive commands to transfer
> incrementally only the delta between the snapshots. Between that and
> extra integrity-preserving features it makes reaching for backups much
> less frequent.

So for example, before I start working on some source code ~/src/test.c,
I make a snapshot, and when I'm unhappy with the result, I revert to
what I made the snapshot of?  What about emails that have been received
in ~/Mail in the meantime?

>> Checksumming is sure good to have, being able to fully use the disk
>> caches is, too, as well as not wasting space through fixed block sizes.
>
> Fixed block sizes don't waste space on traditional RAID. Variable
> block sizes are a performance feature that allows ZFS to work around
> the parity RAID problem of performance dropping down to 50% of
> performance of a single disk on partial stripe writes.

When every file occupies at least 4k because that's the block size the
FS is using, you can waste a lot of space.

>> I've never made a snapshot and don't know what I would make one for
>> other than perhaps making a snapshot of the dom0 and the VMs --- which
>> would require booting from ZFS, figuring out how to make snapshots and
>> where to put them and how to restore them.
>
> It sounds like your FS usage isn't advanced enough.

I don't care too much about file systems and try to keep things
simple.

>> The biggest advantage would be checksumming.  I'd be trading that
>> against ease of use and great complexity.
>
> Not to mention resistance to learning something new.

Not mentioning the risks involved ...

>> So you can see how it is not
>> understandable to me what makes ZFS so great that I wouldn't be able to
>> do without anymore.
>
> Then don't use it.

Maybe, maybe not --- learning about it doesn't hurt.

>>>> So you would be running ZFS on unreliable disks, with the errors being
>>>> corrected and going unnoticed, until either, without TLER, the system
>>>> goes down or, with TLER, until the errors aren't recoverable anymore and
>>>> become noticeable only when it's too late.
>>>
>>> ZFS tells you it had problems ("zpool status"). ZFS can also check
>>> entire pool for defects ("zpool scrub", you should do that
>>> periodically).
>>
>> You're silently loosing more and more redundancy.  How do you know when
>> a disk needs to be replaced?
>
> Same way you know with any disk failure - appropriate
> monitoring. Surely that is obvious.

It's not obvious at all.  Do you replace a disk when ZFS has found 10
errors?

>> Does ZFS maintain a list of bad sectors which are not to be used again?
>
> By that fact you are asking this question, I dare say you need to go
> and read up more on how modern disks work. Modern disks manage their
> defects themselves. When a sector fails and cannot be read, they
> return an error on the read, and mark the sector as pending. Next time
> that sector is written, they will write it to one of the spare, hidden
> sectors, and map the LBA for the failed sector to the new
> sector. There has been no need for the file system to keep track of
> physical disk defects in decades.

That's assuming that the disks reliably do what they are supposed to
do.  Can you guarantee that they always will?

> zfs send produces a data stream that can be applied to another pool
> using zfs receive. You can pipe this over ssh or netcat to a different
> machine, or you can pipe it to a different pool locally.

So I'd be required to also use ZFS on the receiving side for it to make
sense.

>> That's some laboratory experimenting with ZFS.  Backblaze uses ext4,
>> though ZFS would seem to be a very good choice for what they're doing.
>> How can they store so much data without checksumming, without using ECC
>> RAM and not experience a significant amount of data corruption?
>
> You are asking the wrong question - how would they know if they are
> experiencing data corruption? The vast majority of backups are
> write-only. If 4KB of data (one sector) goes bad for every 10TB read,
> if only 1% of the backups ever need to get retrieved, that's one
> detected broken file over 1 petabyte of data stored.

They claim[1] that they are currently storing over 100 petabytes and
have restored 6.27 billion files.  They are expecting to store another
500 petabyte at another datacenter.  That's over a hundred, and if they
meet their plan, at least 500 detected broken files, so they must know.

And I would guess that the number of retrieved files is far greater than
1%.  You get unlimited storage for $5/month and are able to retrieve a
particular single file without significant delays.  When you're using
their service, why would you even keep files you don't access frequently
on your own disks?  At some point, it's cheaper to have them in backups
and to just retrieve them when you need them.  You think you have use
for a NAS or something similar?  Why throw the money at it when you can
have something very similar for $5/month?

How many people have large amounts of data which must be available right
away and couldn't be stored remotely (letting security issues aside)?
Can you store the part of your data which you do not need to have
available right away for a total cost of only $5/month yourself while
that data is readily accessible at any time?

Considering that, the rate of data restored may well be 20%--50% or even
more.  And with only a single file they are unable to restore, their
service would have failed.

So how can they afford not to use ECC RAM and to use a file system that
allows for data corruption?


[1]: http://blog.backblaze.com/category/behind-backblaze/

>> What is the actual rate of data corruption or loss prevented or
>> corrected by ZFS due to its checksumming in daily usage?
>
> According to disk manufacturers' own specifications for their own
> disks (i.e. assume it's worse), one unrecoverable error in 10^14 bits
> read. This doesn't include complete disk failures.

That still doesn't answer the question ...


-- 
Knowledge is volatile and fluid.  Software is power.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.