[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
On 2014-07-09 02:13, lee wrote: But then, dom0 and the VMs areon a RAID-1, so I'd have to make backups of everything, change to JBOD, figure out how to boot from ZFS and how to restore from the backup. Anyidea how to do that? Does ZFS provide swap partitions? If not, I'd have to put them on RAID devices, but I wouldn't have any.Swap for dom0 or for domUs?All have swap partitions.For dom0, as I said before, I use RAID1 for the /boot and rootfs.And leave the rest of the disks unused? Do whatever you want with the remaining space. ZFS can use both partitions and block devices. For domU, you put it on whatever volume the rest of the domU filesystems are on.Without swap partitions? No, partition the domU virtual disk inside the domU in any way you like, including swap partitions. A hardware RAID controller will typically kick out disks based on relatively low error thresholds. ZFS will try to hold onto disks as long as they are responsive to the kernel (within SCSI command timeouts), which means that it will try to maintain redundancy much better, and will keep fixing all the errors it encounters in the meantime.Which is better? In both cases, another disk could fail shortly after the first one has. Failure has degrees. Having two partially failing disks (failed sectors) in an n+1 redundancy array may still yield a complete copy of the data. ZFS will keep those disks for as long as they are responsive, while rebuilding data onto a new disk, and pick whatever data is healthy on each of the old disks. How often does your RAID controller scrub the array to check for errors? If it finds that in a particular RAID5 stripe the data doesn't match the parity, but none of the disks return an error, does it trust that the data is correct or the parity is correct? If parity, which combination of data blocks does it assume are correct, and which block needs to be repaired? ZFS can recover from this even with n+1 redundancy because each data stripe has a checksum independent of the parity, so it is possible to establish which combination of surviving data+parity blocks is the correct one, and which blocks need to be re-built.Interesting question --- are you saying the hardware RAID controller hasno way of knowing which data is good because it uses parity information merely to be able to reconstruct data when a part of that data is not available anymore while ZFS uses checksums on each part of the data which not only allows it to reconstruct the data when a part of it isunavailable, but it also can know which part of the data is good becauseit assumes that the data for which the checksums match is good? Yes, that is exactly what I'm saying. https://blogs.oracle.com/timc/entry/demonstrating_zfs_self_healingYou can see that the data can stillbe read and that the number of errors has gone up. That the number oferrors has increased contradicts that the errors have been fixed.Only if you have no clue how file systems, RAID, and disk accesses work. In which case you should be using an OS designed for people with that level of interest in understanding.That's what I said: When you don't know ZFS, you see the contradiction. Common sense makes you at least suspicious when you are supposed to assume that an error has been fixed and see more errors showing up. The entire premise is wrong - you cannnot meaningfully gain information from a test without understanding the test. When you look atto what lengths backblaze claims to have gone to to keep costs low, itis entirely inconceivable that they would skip out on something thatwould save them half their costs for spurious or non-technical reasons.You'd think so.I got an email from them, and they're saying they are considering using ZFS and that the software they're using does checksumming. How exactly it does it wasn't said. They also said their encryption software is closed source and that it's up to you to trust them or not. So we can only guess who has access to all the data they store. Or you could encrypt your data before becking it up. If you use something like encfs, you can back up the underlying encrypted data rather than the unencrypted data. That way it doesn't matter how they store it. What is the actual rate of data corruption or loss prevented or corrected by ZFS due to its checksumming in daily usage?The following articles provide some good info: http://static.googleusercontent.com/media/research.google.com/en//archive/disk_failures.pdf http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.pdfThey don't answer the question, either.So you didn't read the articles, then.I looked at them.Graph (b) in Figure 3. of the second article shows the number of latent sector errors per GB over 18 months of use, by disk model. So depending on your disk you could be getting a silent disk error as often as once per 100GB. Unrecoverable sector errors (i.e. non latent disk errors) are on top of that.It doesn't answer the question.How much data can you, in daily usage, read/write from/to a ZFS file system with how many errors detected and corrected only due to the checksumming ZFS does?See above. Depending on disk make/model, potentially as high as one per 100GB on some disk models.Potentially, theoretically, no ZFS involved ... You are using ZFS, so do you see this one error per 100GB? Or what do you see? It's not something I log/graph. Maybe I should add it to my zabbix setup... _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |