[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance



Kuba <kuba.0000@xxxxx> writes:

>>> Does "rm" sound destructive or try to warn you? It just does what you
>>> tell it to do.
>>
>> It's not a file system and has options to warn you.  The options aren't
>> enabled by default because it won't make much sense.  For a file systen,
>> it would make sense to get a warning like "this will destroy your
>> current data" when you issue a command that would perform a rollback and
>> to have an option to disable the warning.
>
> I believe one should know exactly what hitting [enter] is going to do
> when the line you're typing on starts with a #.

I think that it doesn't matter what the line starts with and that people
do not always know what they are doing.  Besides, what a line starts
with is configurable.

>>> Snapshots are just snapshots, making them does not copy your data
>>> (well, in fact, ZFS is a COW file system, so making a snapshot may
>>> result in actually copying your data later on, if it's needed, but
>>> it's not copying as in "making a backup"). Replicating a snapshot
>>> results in creation of another dataset identical to the original
>>> snapshot. It's just a one more way of making full or incremental
>>> backups.
>>
>> So it's making a backup and not making a backup?  What are snapshots
>> good for when I can't restore from them, i. e. use them for backups?
>
> Snapshots are not backups. I believe it holds true for anything that
> lets you make a snapshot.

Hm.  I have a camera and I can make snapshots with it, and they are not
backups but pictures.  I don't know what I would make a picture of a
file system for.  It's not like making pictures of the brakes on your
car in different states of disassembly to refer to later when you put
them back together, is it?  It would be like trying to assemble the
brakes without having any of the parts.

I suppose I could give a snapshot of the brakes to someone for some
purpose, like making it easier to get the right replacement part.  But a
snapshot of a file system that holds my data?  Nobody would have a
replacement part for that.

>>>> What if I need to access a file that's in the snapshot:  Do I
>>>> need to restore the snapshot first?
>>>
>>> Usually you can "cd .zfs" directory, which contains subdirectories
>>> named after your snapshots, and inside that directories you have
>>> complete datasets just like the ones you took the snapshots of. No
>>> rollback/restoring/mounting is necessary.
>>
>> And that also works when the file system the snapshot was created from
>> doesn't exist anymore, or when the disks with the FS the snapshot was
>> made from have become inaccessible, provided that the snapshot was made
>> to different disks?
>
> Oversimplifying: yes.

So it's as good as a backup?  What's the difference then?  Is it like
the difference between a picture and a picture?

>>> Yes, I take snapshots all the time. This way it's easy for me to
>>> revert VMs to previous states, clone them, etc. Same goes with my
>>> regular data. And I replicate them a lot.
>>
>> Hm, what for?  The VMs I have are all different, so there's no point in
>> cloning them.  And why would I clone my data?  I don't even have the
>> disk capacity for that and am glad that I can make a backup.
>
> I tend to clone "production" VMs before I start fiddling with them, so
> that I can test potentially dangerous ideas without any
> consequences. Clones are "free" - they only start using more space
> when you introduce some difference between the clone and the original
> dataset. You can always 'promote' them so they become independent from
> the original dataset (using more space as required). Cloning is just a
> tool that you might or might not find useful.

I see --- and I'd find that useful.  I have the VMs in a LVM volume
group with one logical volume for each VM.  Each VM has two partitions,
one for a root file system and another one for swap.  How would that
translate to ZFS?

Where's this additional space taken from?

> I suppose it's all relative. Couple of years ago I switched to FreeBSD
> (unknown to me before) for my storage VMs only because it had ZFS
> which I had found to be the only solution to the problems I had at
> that time. That really meant a lot of learning, experimentation and
> uncertainties. It paid off for me. I'm not saying it will pay off for
> you. All I'm saying is 'look, here's this ZFS thing, there's a chance
> you might find it interesting'. By all means I'm not saying 'this is
> ZFS, it will solve all your problems and you have to use it'.

If it wasn't interesting, I wouldn't be writing all these postings.

>>>>>> So you would be running ZFS on unreliable disks, with the errors being
>>>>>> corrected and going unnoticed, until either, without TLER, the system
>>>>>> goes down or, with TLER, until the errors aren't recoverable anymore and
>>>>>> become noticeable only when it's too late.
>>>>>
>>>>> ZFS tells you it had problems ("zpool status"). ZFS can also check
>>>>> entire pool for defects ("zpool scrub", you should do that
>>>>> periodically).
>>>>
>>>> You're silently loosing more and more redundancy.
>>>
>>> I'm not sure what you mean by loosing redundancy.
>>
>> You don't know whether the data has been written correctly before you
>> read it.  The more errors there are, the more redundancy you loose
>> because you have more data that can be read from only a part of the
>> disks.  If there is an error on another disk with that same data, you
>> don't know until you try to read it and perhaps find out that you can't.
>> How many errors for that data it takes depends on the level of
>> redundancy.
>
> I don't understand your point here. Do you know all your data had been
> written correctly with any other form of RAID without reading it back?

No, but I know that the raid controller does scrubbing and fails a disk
eventually.  There is no in-between like there seems to be with ZFS.

My point is that you can silently loose redundancy with ZFS.  RAID
controllers aren't exactly known to silently loose redundancy, are they?

>> And how do you know when to replace a disk?  When there's one error or
>> when there are 50 or 50000 or when the disk has been disconnected?
>
> I believe it's up to you to interpret the data you're presented with
> and make the right decision. I really wish I could formulate a
> condition that evaluates to true or false telling me what should I do
> with a disk.

RAID controllers make that easy for you --- not necessarily better, but
easier.

>>>> What is the actual rate of data corruption or loss prevented or
>>>> corrected by ZFS due to its checksumming in daily usage?
>>>
>>> I have experienced data corruption due to hardware failures in the
>>> past.
>>
>> Hardware failures like?
>
> The typical ones. Bad sectors, failed flash memory banks, failed ram
> modules.

And only ZFS detected them?

>>> http://www.zdnet.com/blog/storage/dram-error-rates-nightmare-on-dimm-street/638
>>
>> Yes, I've seen that.  It's for RAM, not disk errors detected through ZFS
>> checksumming.
>
> And RAM has nothing to do with the data on the disks.

that depends


-- 
Knowledge is volatile and fluid.  Software is power.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.