[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance

Gordan Bobic <gordan@xxxxxxxxxx> writes:

>> On 07/06/2014 02:42 PM, lee wrote:
>>> Gordan Bobic <gordan@xxxxxxxxxx> writes:
>>>> On 07/05/2014 03:57 AM, lee wrote:
>>>>> Kuba <kuba.0000@xxxxx> writes:
>>>>>> W dniu 2014-07-04 19:11, lee pisze:
>>>>>>> Kuba <kuba.0000@xxxxx> writes:
>>>>>> "Rollback" doesn't sound very destructive.
> Rolling back your file system state to a previous point in time
> doesn't sound destructive to you? It doesn't convey the meaning that
> you will lose changes to the file system made since the point you are
> rolling back to?

No, it doesn't sound destructive.  It sounds more like "repairing" (the
currently broken state with a previous state known to be good).

>> Because ZFS makes a point of keeping data save.  And there are some
>> warnings you get from xfs.
> Not from user error. Backups keep data safe from user error.

xfs sometimes warns you when you err.  And backups keep data only so

>> But then, dom0 and the VMs are
>> on a RAID-1, so I'd have to make backups of everything, change to JBOD,
>> figure out how to boot from ZFS and how to restore from the backup.  Any
>> idea how to do that?  Does ZFS provide swap partitions?  If not, I'd
>> have to put them on RAID devices, but I wouldn't have any.
> Swap for dom0 or for domUs?

All have swap partitions.

> For dom0, as I said before, I use RAID1 for the /boot and rootfs.

And leave the rest of the disks unused?

> I typically put this on a RAID1 disk set, where the dom0's swap could
> also live (when I absolutely need swap, I use a relatively small zram,
> because most of the time swapping is a great way to cripple your
> machine into a completely unusable state that is often worse than a
> process OOM-ing and dying.

That depends --- when the wrong process is killed, the whole system may
become unstable or go down.  When it takes a while to fill the swap
space, that may give you time to intervene before it's too late.

> For domU, you put it on whatever volume the rest of the domU
> filesystems are on.

Without swap partitions?

>>>> The more errors there are, the more redundancy you loose
>>>> because you have more data that can be read from only a part of the
>>>> disks.
>>> ZFS immediately repairs all errors it encounters, so this is not a
>>> valid point.
>> It may not encounter all the errors there are (until it's too late to
>> repair them), so the point remains valid.
> No it doesn't, because the point is relative to other available
> solutions, all of which fare far worse under the circumstances
> discussed.

It's not relative to anything, it's about ZFS.  Whether other things do
better or worse is a different question.

> A hardware RAID controller will typically kick out disks based on
> relatively low error thresholds. ZFS will try to hold onto disks as
> long as they are responsive to the kernel (within SCSI command
> timeouts), which means that it will try to maintain redundancy much
> better, and will keep fixing all the errors it encounters in the
> meantime.

Which is better?  In both cases, another disk could fail shortly after
the first one has.

> How often does your RAID controller scrub the array to check for
> errors? If it finds that in a particular RAID5 stripe the data doesn't
> match the parity, but none of the disks return an error, does it trust
> that the data is correct or the parity is correct? If parity, which
> combination of data blocks does it assume are correct, and which block
> needs to be repaired? ZFS can recover from this even with n+1
> redundancy because each data stripe has a checksum independent of the
> parity, so it is possible to establish which combination of surviving
> data+parity blocks is the correct one, and which blocks need to be
> re-built.

Interesting question --- are you saying the hardware RAID controller has
no way of knowing which data is good because it uses parity information
merely to be able to reconstruct data when a part of that data is not
available anymore while ZFS uses checksums on each part of the data
which not only allows it to reconstruct the data when a part of it is
unavailable, but it also can know which part of the data is good because
it assumes that the data for which the checksums match is good?

>>>>>>> https://blogs.oracle.com/timc/entry/demonstrating_zfs_self_healing
>> You can see that the data can still
>> be read and that the number of errors has gone up.  That the number of
>> errors has increased contradicts that the errors have been fixed.
> Only if you have no clue how file systems, RAID, and disk accesses
> work. In which case you should be using an OS designed for people with
> that level of interest in understanding.

That's what I said: When you don't know ZFS, you see the contradiction.
Common sense makes you at least suspicious when you are supposed to
assume that an error has been fixed and see more errors showing up.

>>> Solutions that aren't optimal are used all the time for all kinds of
>>> spurious and non-technical reasons.
>> I don't find that convincing.  Companies aren't willing to pay their
>> employees salaries that would even cover the costs of living, and they
>> are always trying to save money in any way they can.
> Which is why they are getting the quality of the solutions described.

Good point :)

>> When you look at
>> to what lengths backblaze claims to have gone to to keep costs low, it
>> is entirely inconceivable that they would skip out on something that
>> would save them half their costs for spurious or non-technical reasons.
> You'd think so.

I got an email from them, and they're saying they are considering using
ZFS and that the software they're using does checksumming.  How exactly
it does it wasn't said.  They also said their encryption software is
closed source and that it's up to you to trust them or not.

So we can only guess who has access to all the data they store.

>>>>> assurance. My personal approach is "hope for the best, plan for the
>>>>> worst".
>>>> The problem is that you can plan whatever you want and things turn out
>>>> otherwise regardless.  My grandma already knew that.
>>> Expect the unexpected and be prepared. Every boy scout already knows that.
>> When they grow up, they find out that it doesn't work.
> So your view is to not bother taking precautions?

No, I'm merely saying that no matter what your plans are, things turn
out in whatever way they do.  I'm also assuming that the more
complicated a plan is, the more chances it has to fail.

>> There are lots of reasons for not being prepared for everything, and
>> being prepared for everything when things go wrong nonetheless can be
>> difficult to justify.
> Depends on what's at stake. If you know what you are doing the
> overheads of ensuring timely recoverability are not particularly
> significant.

That depends on what you consider as significant.

>>>>>> What is the actual rate of data corruption or loss prevented or
>>>>>> corrected by ZFS due to its checksumming in daily usage?
>>> The following articles provide some good info:
>>> http://static.googleusercontent.com/media/research.google.com/en//archive/disk_failures.pdf
>>> http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.pdf
>> They don't answer the question, either.
> So you didn't read the articles, then.

I looked at them.

> Graph (b) in Figure 3. of the second article shows the number of
> latent sector errors per GB over 18 months of use, by disk model. So
> depending on your disk you could be getting a silent disk error as
> often as once per 100GB. Unrecoverable sector errors (i.e. non latent
> disk errors) are on top of that.

It doesn't answer the question.

>> How much data can you, in daily
>> usage, read/write from/to a ZFS file system with how many errors
>> detected and corrected only due to the checksumming ZFS does?
> See above. Depending on disk make/model, potentially as high as one
> per 100GB on some disk models.

Potentially, theoretically, no ZFS involved ...

You are using ZFS, so do you see this one error per 100GB?  Or what do
you see?

Knowledge is volatile and fluid.  Software is power.

Xen-users mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.