[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance



Gordan Bobic <gordan@xxxxxxxxxx> writes:

>> On 07/06/2014 04:38 PM, lee wrote:
>>> Gordan Bobic <gordan@xxxxxxxxxx> writes:
>>
>>>>> On 07/04/2014 06:11 PM, lee wrote:
>
> IMO, the problem is in a distribution teaching it's users that what
> doesn't ship with the distribution might as well not exist. That kind
> of conditioning is what I am referring to.

Yet no one taught me that.

>> So for example, before I start working on some source code ~/src/test.c,
>> I make a snapshot, and when I'm unhappy with the result, I revert to
>> what I made the snapshot of?  What about emails that have been received
>> in ~/Mail in the meantime?
>
> Don't keep ~/Mail and src on the same volume.

So I can't do it with directories?  That would be useful.

>> When every file occupies at least 4k because that's the block size the
>> FS is using, you can waste a lot of space.
>
> ZFS cannot use stripes smaller than (sector size) + (redundancy).
>
> i.e. if you use disks with 4KB sectors, and you are writing a 10 byte
> file on RAIDZ2 (n+2 redundancy, similar to RAID6), that will use 3
> sectors (one for the data, plus two for n+2 redundancy), i.e. 12KB.
>
> Variable stripe width is there to improve write performance of partial
> writes.

And the checksums go into the same sector?  So for writing a file that's
4k, two sectors would be used, plus redundancy?

If that is so, wouldn't the capacity appear to be increased or to be
variable with ZFS, depending on file size?  I'm confused now ...

>>>> The biggest advantage would be checksumming.  I'd be trading that
>>>> against ease of use and great complexity.
>>>
>>> Not to mention resistance to learning something new.
>>
>> Not mentioning the risks involved ...
>
> Perhaps our experiences differ - mine shows that lying and dying disks
> pose a sufficiently high risk of data loss that a traditional RAID and
> file system cannot be trusted with keeping the data safe.

That doesn't eliminate the risks.  Perhaps I've been lucky --- the more
I learn about it, the more I think I should do something.

>>>> So you can see how it is not
>>>> understandable to me what makes ZFS so great that I wouldn't be able to
>>>> do without anymore.
>>>
>>> Then don't use it.
>>
>> Maybe, maybe not --- learning about it doesn't hurt.
>
> Then you better stop coming up with reasons to not use it. :)

The more I find that the reasons not to use it are no good and the more
good reasons I find to use it, the more I will be inclined to use it.

>>> Same way you know with any disk failure - appropriate
>>> monitoring. Surely that is obvious.
>>
>> It's not obvious at all.  Do you replace a disk when ZFS has found 10
>> errors?
>
> Do you replace a disk when SMART is reporting 10 reallocated sectors?

No, I'm not using smart.

> Can you even get to information of that granularity with most hardware
> RAID controllers?

You can probably see the smart values, depending on the controller, or
it might use them by itself.  I never bothered to find out.

> You have to exercise some reasonable judgement there, and apply
> monitoring, just like you would with any other disk/RAID.

It's simple with RAID because the disk either fails or not.  It's
usually simple with disks because it either fails or not.  Introducing
another indicator which may mean that a disk has "failed a little"
doesn't make things simpler.

>>> zfs send produces a data stream that can be applied to another pool
>>> using zfs receive. You can pipe this over ssh or netcat to a different
>>> machine, or you can pipe it to a different pool locally.
>>
>> So I'd be required to also use ZFS on the receiving side for it to make
>> sense.
>
> Indeed.

That's kinda evil because if I run into problems with ZFS, it might be
good to have used a different file system for backups, and when I have
to restore from the backup, I might get corrupted data because it hasn't
been checksumed.

>> They claim[1] that they are currently storing over 100 petabytes and
>> have restored 6.27 billion files.  They are expecting to store another
>> 500 petabyte at another datacenter.  That's over a hundred, and if they
>> meet their plan, at least 500 detected broken files, so they must know.
>
> Mentioning such a thing occurs could be considered bad for business.

There's no good way to hide it because ppl would notice when restoring.
But they say their software does checksuming, just not how.

>> And I would guess that the number of retrieved files is far greater than
>> 1%.  You get unlimited storage for $5/month and are able to retrieve a
>> particular single file without significant delays.  When you're using
>> their service, why would you even keep files you don't access frequently
>> on your own disks?
>
> Because their software only lets you back up files you store on your
> disk

You can't very well back up a file that you don't have.

> - last I checked there are restrictions in place to prevent abuse of
> the system by using it as unlimited cloud storage rather than backups.

It seemed they have some client software that goes through your files
and backs them up, with the option to exclude some.  You could simply
disconnect a disk which is backed up, or use it for something else,
perhaps in a different machine, then restore a file that has been on
that disk, perhaps to a different disk/machine.  If you can't do that,
their service is rather useless.  If you can, what could they do to
prevent you from keeping files not on one of your disks in the backup?

That a file isn't available to their client software anymore doesn't
mean that you have deleted it or that you never want to have it
available again.

> remember from last time I checked. There's also no Linux support, so I
> don't use it, so I cannot tell you any more details.

It seemed so --- and I won't use it as long as their software isn't open
source, and long not after that, because I'd have no way to know who
could access my data.

>>>> What is the actual rate of data corruption or loss prevented or
>>>> corrected by ZFS due to its checksumming in daily usage?
>>>
>>> According to disk manufacturers' own specifications for their own
>>> disks (i.e. assume it's worse), one unrecoverable error in 10^14 bits
>>> read. This doesn't include complete disk failures.
>>
>> That still doesn't answer the question ...
>
> If you define what "daily usage" is in TB/day, you will be able to
> work out how many errors per day you can expect from the numbers I
> mentioned above.

That would be theoretical numbers.  That so many errors /can/ occur
doesn't mean that they /do/ occur.  ZFS is not involved in determining
such numbers, either, and there may be more or less errors ZFS detects
than what the specification says.  I'm asking what the rate actually is.

I have disks for which the specification says that the MTBF is one
million hours (or maybe even two million).  That means I would see only
one single failure in over a hundred years, and if I had over a hundred
of such disks, I would only see one failure within a year.  My
experience indicates, and studies with large numbers of disks indicate,
that this is BS and that the actual failure rate is much different.


-- 
Knowledge is volatile and fluid.  Software is power.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.