[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
On 2014-07-09 00:45, lee wrote: I was thinking of errors detected by ZFS, though. What if you see afew: Do you replace the disk, or do you wait until there are so many?Depends on the rate at which they are showing up. If every week the same disk throws a few errors, then yes, it is a good candidate for replacing. But usually there are other indications in SMART and syslog, e.g. command timeouts, bus resets, and similar.Hm, interesting ... Would you say that there is a correlation between timeouts/bus resets and errors detected by ZFS? Like no significant numbers of ZFS-detected errors showing up before the timeouts/resets happen? It depends on how the disk is failing. A lot of the time bus timeouts will result in chechsum errors in ZFS as it was unable to retrieve the data off the disk, but that could be caused by a number of failures, some more critical than others. It could be something as trivial as a marginal SATA cable, or it could be the disk failing to read back a sector. Or it could be a disk becoming completely unresponsive (in which case the kernel will disconnect it and ZFS will show it as failed and put the vdev into degraded state. or does ZFS keep a list of sectors not to use anymore?As I said before, disk's handle their own defects, and have done forthe past decade or two. File systems have long had no place in keepingtrack of duff sectors on disks.So ZFS may silently loose redundancy (and in a bad case data), depending on what the disks do. And there isn't any way around that, other thanincreasing redundancy.How do you define "silently"?"Silently" as in "not noticed" because ZFS doesn't detect the errorsbefore attempting to read. When a disk behaves badly, ZFS would have toassume that data has been written correctly while it hasn't. For that data, there is no redundancy because it has been "silently" lost (or never existed). In that case, yes, without reading the data back, you can never be completely sure. If this is important to you, you will need to buy disks with Write-Read-Verify feature. In practical terms, however, there is no way (nor reason) to distinguish between sectors that were written wrong (or not at all) and those that got corrupted after being written. The only metric that matters is whether the data is there when you want to access it. _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |