[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-users] Cheap IOMMU hardware and ECC support importance
W dniu 2014-07-06 14:20, lee pisze: Kuba <kuba.0000@xxxxx> writes:Does "rm" sound destructive or try to warn you? It just does what you tell it to do.It's not a file system and has options to warn you. The options aren't enabled by default because it won't make much sense. For a file systen, it would make sense to get a warning like "this will destroy your current data" when you issue a command that would perform a rollback and to have an option to disable the warning.I believe one should know exactly what hitting [enter] is going to do when the line you're typing on starts with a #.I think that it doesn't matter what the line starts with and that people do not always know what they are doing. Besides, what a line starts with is configurable. I'd change it to "?" for users and "!" for root. Snapshots are just snapshots, making them does not copy your data (well, in fact, ZFS is a COW file system, so making a snapshot may result in actually copying your data later on, if it's needed, but it's not copying as in "making a backup"). Replicating a snapshot results in creation of another dataset identical to the original snapshot. It's just a one more way of making full or incremental backups.So it's making a backup and not making a backup? What are snapshots good for when I can't restore from them, i. e. use them for backups?Snapshots are not backups. I believe it holds true for anything that lets you make a snapshot.Hm. I have a camera and I can make snapshots with it, and they are not backups but pictures. I don't know what I would make a picture of a file system for. It's not like making pictures of the brakes on your car in different states of disassembly to refer to later when you put them back together, is it? It would be like trying to assemble the brakes without having any of the parts. I suppose I could give a snapshot of the brakes to someone for some purpose, like making it easier to get the right replacement part. But a snapshot of a file system that holds my data? Nobody would have a replacement part for that. If you can't do it with ZFS, please try taking an LVM snapshot to get the general idea of what a snapshot is. What if I need to access a file that's in the snapshot: Do I need to restore the snapshot first?Usually you can "cd .zfs" directory, which contains subdirectories named after your snapshots, and inside that directories you have complete datasets just like the ones you took the snapshots of. No rollback/restoring/mounting is necessary.And that also works when the file system the snapshot was created from doesn't exist anymore, or when the disks with the FS the snapshot was made from have become inaccessible, provided that the snapshot was made to different disks?Oversimplifying: yes.So it's as good as a backup? What's the difference then? Is it like the difference between a picture and a picture? Snapshot replicated to another set of disk on another machine qualifies for me as a backup. When you take a snapshot, it's stored on the same pool, so if you loose the pool, you loose the snapshots too. Yes, I take snapshots all the time. This way it's easy for me to revert VMs to previous states, clone them, etc. Same goes with my regular data. And I replicate them a lot.Hm, what for? The VMs I have are all different, so there's no point in cloning them. And why would I clone my data? I don't even have the disk capacity for that and am glad that I can make a backup.I tend to clone "production" VMs before I start fiddling with them, so that I can test potentially dangerous ideas without any consequences. Clones are "free" - they only start using more space when you introduce some difference between the clone and the original dataset. You can always 'promote' them so they become independent from the original dataset (using more space as required). Cloning is just a tool that you might or might not find useful.I see --- and I'd find that useful. I have the VMs in a LVM volume group with one logical volume for each VM. Each VM has two partitions, one for a root file system and another one for swap. How would that translate to ZFS? Where's this additional space taken from? From the pool. I suppose it's all relative. Couple of years ago I switched to FreeBSD (unknown to me before) for my storage VMs only because it had ZFS which I had found to be the only solution to the problems I had at that time. That really meant a lot of learning, experimentation and uncertainties. It paid off for me. I'm not saying it will pay off for you. All I'm saying is 'look, here's this ZFS thing, there's a chance you might find it interesting'. By all means I'm not saying 'this is ZFS, it will solve all your problems and you have to use it'.If it wasn't interesting, I wouldn't be writing all these postings. Please don't get me wrong, but if you find that interesting maybe it would benefit you more to read some docs or howtos and try it for yourself instead of trying to learn about a thing you've never touched by asking questions? I'm no expert on the subject, so chances are I'd unintentionally provide you with information that's incorrect. So you would be running ZFS on unreliable disks, with the errors being corrected and going unnoticed, until either, without TLER, the system goes down or, with TLER, until the errors aren't recoverable anymore and become noticeable only when it's too late.ZFS tells you it had problems ("zpool status"). ZFS can also check entire pool for defects ("zpool scrub", you should do that periodically).You're silently loosing more and more redundancy.I'm not sure what you mean by loosing redundancy.You don't know whether the data has been written correctly before you read it. The more errors there are, the more redundancy you loose because you have more data that can be read from only a part of the disks. If there is an error on another disk with that same data, you don't know until you try to read it and perhaps find out that you can't. How many errors for that data it takes depends on the level of redundancy.I don't understand your point here. Do you know all your data had been written correctly with any other form of RAID without reading it back?No, but I know that the raid controller does scrubbing and fails a disk eventually. There is no in-between like there seems to be with ZFS. My point is that you can silently loose redundancy with ZFS. RAID controllers aren't exactly known to silently loose redundancy, are they?And how do you know when to replace a disk? When there's one error or when there are 50 or 50000 or when the disk has been disconnected?I believe it's up to you to interpret the data you're presented with and make the right decision. I really wish I could formulate a condition that evaluates to true or false telling me what should I do with a disk.RAID controllers make that easy for you --- not necessarily better, but easier. As I said, it's up to you to make the choices. What is the actual rate of data corruption or loss prevented or corrected by ZFS due to its checksumming in daily usage?I have experienced data corruption due to hardware failures in the past.Hardware failures like?The typical ones. Bad sectors, failed flash memory banks, failed ram modules.And only ZFS detected them? Making an image of a hard drive on a new laptop only to find out the source and destination checksums are different isn't the best possible way to discover a damaged ram (just one random example of how I found out about it once). Copying data without any form of checksumming (even done manually) is like rolling dice for me. http://www.zdnet.com/blog/storage/dram-error-rates-nightmare-on-dimm-street/638Yes, I've seen that. It's for RAM, not disk errors detected through ZFS checksumming.And RAM has nothing to do with the data on the disks.that depends Exactly. And what if a bit flips not in the data buffer itself, but in a place that happens to be the place where a simple pointer value is stored, so it no longer points to the data that is to be copied to the disk. If you happen not to cause access violation, you end up with completely different data stored on the disk (or copied from it). Or some other pointer is changed so a loop writing to a disk gets executed 10^10 times instead of 10. Or a machine code gets modified, so your kernel goes berserk and names itself skynet? Kuba _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |