[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
On 07/06/2014 04:38 PM, lee wrote: Gordan Bobic <gordan@xxxxxxxxxx> writes:On 07/04/2014 06:11 PM, lee wrote: Thanks! Sooner or later I'll try it out. How come there are no packages in the Debian repos other than the fuse package?Is this some kind of Debian/Ubuntu brain damage that demands that everything be pre-chewed and served on a plate via the distro attached repositories? That's a very solipsistic view.If you think it's an indication of brain damage to prefer using software that is included in the distribution you're using and to wonder why a particular software isn't included, then it must be brain damage. Why? My preferred distribution is Enterprise Linux (RedHat, CentOS, Scientific, or derivatives thereof). I maintain a derivative thereof (for ARM, because nobody else did). EL package set is relatively limited, and even if you include well known external repositories like epel and rpmforge, it is still easy to find relatively well known packages that are either not included or have ancient versions in those repositories. IMO, the problem is in a distribution teaching it's users that what doesn't ship with the distribution might as well not exist. That kind of conditioning is what I am referring to. Why exactly is that? Are you modifying your storage system all the time or making snapshots all the time?Since snapshots in ZFS are "free" in terms of performance, they are much more useful for everyday use. They also make incremental backups easier because you can use send/receive commands to transfer incrementally only the delta between the snapshots. Between that and extra integrity-preserving features it makes reaching for backups much less frequent.So for example, before I start working on some source code ~/src/test.c, I make a snapshot, and when I'm unhappy with the result, I revert to what I made the snapshot of? What about emails that have been received in ~/Mail in the meantime? Don't keep ~/Mail and src on the same volume. Checksumming is sure good to have, being able to fully use the disk caches is, too, as well as not wasting space through fixed block sizes.Fixed block sizes don't waste space on traditional RAID. Variable block sizes are a performance feature that allows ZFS to work around the parity RAID problem of performance dropping down to 50% of performance of a single disk on partial stripe writes.When every file occupies at least 4k because that's the block size the FS is using, you can waste a lot of space. ZFS cannot use stripes smaller than (sector size) + (redundancy).i.e. if you use disks with 4KB sectors, and you are writing a 10 byte file on RAIDZ2 (n+2 redundancy, similar to RAID6), that will use 3 sectors (one for the data, plus two for n+2 redundancy), i.e. 12KB. Variable stripe width is there to improve write performance of partial writes. The biggest advantage would be checksumming. I'd be trading that against ease of use and great complexity.Not to mention resistance to learning something new.Not mentioning the risks involved ... Perhaps our experiences differ - mine shows that lying and dying disks pose a sufficiently high risk of data loss that a traditional RAID and file system cannot be trusted with keeping the data safe. So you can see how it is not understandable to me what makes ZFS so great that I wouldn't be able to do without anymore.Then don't use it.Maybe, maybe not --- learning about it doesn't hurt. Then you better stop coming up with reasons to not use it. :) So you would be running ZFS on unreliable disks, with the errors being corrected and going unnoticed, until either, without TLER, the system goes down or, with TLER, until the errors aren't recoverable anymore and become noticeable only when it's too late.ZFS tells you it had problems ("zpool status"). ZFS can also check entire pool for defects ("zpool scrub", you should do that periodically).You're silently loosing more and more redundancy. How do you know when a disk needs to be replaced?Same way you know with any disk failure - appropriate monitoring. Surely that is obvious.It's not obvious at all. Do you replace a disk when ZFS has found 10 errors? Do you replace a disk when SMART is reporting 10 reallocated sectors? Can you even get to information of that granularity with most hardware RAID controllers? You have to exercise some reasonable judgement there, and apply monitoring, just like you would with any other disk/RAID. Does ZFS maintain a list of bad sectors which are not to be used again?By that fact you are asking this question, I dare say you need to go and read up more on how modern disks work. Modern disks manage their defects themselves. When a sector fails and cannot be read, they return an error on the read, and mark the sector as pending. Next time that sector is written, they will write it to one of the spare, hidden sectors, and map the LBA for the failed sector to the new sector. There has been no need for the file system to keep track of physical disk defects in decades.That's assuming that the disks reliably do what they are supposed to do. Can you guarantee that they always will? Of course I can't - but I trust ZFS to mitigate the issue by providing several additional layers that increase the chances that the data will not get damaged. zfs send produces a data stream that can be applied to another pool using zfs receive. You can pipe this over ssh or netcat to a different machine, or you can pipe it to a different pool locally.So I'd be required to also use ZFS on the receiving side for it to make sense. Indeed. That's some laboratory experimenting with ZFS. Backblaze uses ext4, though ZFS would seem to be a very good choice for what they're doing. How can they store so much data without checksumming, without using ECC RAM and not experience a significant amount of data corruption?You are asking the wrong question - how would they know if they are experiencing data corruption? The vast majority of backups are write-only. If 4KB of data (one sector) goes bad for every 10TB read, if only 1% of the backups ever need to get retrieved, that's one detected broken file over 1 petabyte of data stored.They claim[1] that they are currently storing over 100 petabytes and have restored 6.27 billion files. They are expecting to store another 500 petabyte at another datacenter. That's over a hundred, and if they meet their plan, at least 500 detected broken files, so they must know. Mentioning such a thing occurs could be considered bad for business. And I would guess that the number of retrieved files is far greater than 1%. You get unlimited storage for $5/month and are able to retrieve a particular single file without significant delays. When you're using their service, why would you even keep files you don't access frequently on your own disks? Because their software only lets you back up files you store on your disk - last I checked there are restrictions in place to prevent abuse of the system by using it as unlimited cloud storage rather than backups. At some point, it's cheaper to have them in backups and to just retrieve them when you need them. You think you have use for a NAS or something similar? Why throw the money at it when you can have something very similar for $5/month? How many people have large amounts of data which must be available right away and couldn't be stored remotely (letting security issues aside)? Can you store the part of your data which you do not need to have available right away for a total cost of only $5/month yourself while that data is readily accessible at any time? Considering that, the rate of data restored may well be 20%--50% or even more. And with only a single file they are unable to restore, their service would have failed. So how can they afford not to use ECC RAM and to use a file system that allows for data corruption? [1]: http://blog.backblaze.com/category/behind-backblaze/ See above - it can only be used using their closed-source backup software and there are features that get in the way of abuse of the system by using it as just offline storage, and least from what I remember from last time I checked. There's also no Linux support, so I don't use it, so I cannot tell you any more details. What is the actual rate of data corruption or loss prevented or corrected by ZFS due to its checksumming in daily usage?According to disk manufacturers' own specifications for their own disks (i.e. assume it's worse), one unrecoverable error in 10^14 bits read. This doesn't include complete disk failures.That still doesn't answer the question ... If you define what "daily usage" is in TB/day, you will be able to work out how many errors per day you can expect from the numbers I mentioned above. Gordan _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |