[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] How snapshot work on LVMoISCS SR



Thanks Julian, it is more clear for me how snapshot works.


- Anthony

On Wed, 2010-01-27 at 02:37 -0800, Julian Chesterfield wrote:
> Ian Pratt wrote:
> >> That means if guest linux is executing "yum install kernel" when
> >> creating snapshot, the vm created from this snapshot might be not
> >> bootable.
> >>     
> >
> > Because xen issues write completions to the guest only when IO is 
> > completed, the snapshot will at least be crash consistent from a filesystem 
> > point of view (just like a physical system loosing power).
> >
> > Linux doesn't have a generic mechanism for doing higher-level 'freeze' 
> > operations (see Windows VSS) so there's no way to notify yum that we'd like 
> > to take a snapshot. Some linux filesystems do support a freeze operation, 
> > but it's not clear this buys a great deal.  
> >   
> Ack. Without application signalling (as provided by VSS) it's unclear 
> whether there's any real benefit since the application data may still be 
> internally inconsistent.
> 
> FYI - for windows VMs XCP includes a VSS quiesced snapshot option 
> (VM.snapshot_with_quiesce) which utilises the agent running in the guest 
> as a VSS requestor to quiesce the apps, flush the local cache to disk 
> and then trigger a snapshot for all the VMs disks.
> 
> - Julian
> > 99 times out of 100 you'll get away with just taking a snapshot of a VM. If 
> > you're wanting to use the snapshot as a template for creating other clones 
> > you'd be best advised to shut the guest down and get a clean filesystem 
> > though. Any snapshot should be fine for general file backup purposes.
> >
> > Ian
> >
> > PS: I'd be surprised if "yum install kernel" didn't actually go to some 
> > lengths to be reasonably atomic as regards switching grub over to using the 
> > new kernel, otherwise you'd have the same problem on a physical machine 
> > crashing or losing power.
> >
> >   
> >> - Anthony
> >>
> >>
> >>
> >>
> >>     
> >>> Daniel
> >>>
> >>>       
> >>>> How does XCP make sure this snapshot is usable,say, virtual disk
> >>>> metadata is consistent?
> >>>>
> >>>> Thanks
> >>>> - Anthony
> >>>>
> >>>>
> >>>> On Tue, 2010-01-26 at 13:56 -0800, Ian Pratt wrote:
> >>>>         
> >>>>>> I still have below questions.
> >>>>>>
> >>>>>> 1. if a non-leaf node is coalesce-able, it will be coalesced later
> >>>>>>             
> >> on
> >>     
> >>>>>> regardless how big the physical size of this node?
> >>>>>>             
> >>>>> Yes: it's always good to coalesce the chain to improve access
> >>>>>           
> >> performance.
> >>     
> >>>>>> 2. there is one leaf node for a snapshot, actually it may be
> >>>>>>             
> >> empty, does
> >>     
> >>>>>> it exist only because it can prevent coalesce.
> >>>>>>             
> >>>>> Not quite sure what you're referring to here. The current code has a
> >>>>>           
> >> limitation whereby it is unable to coalesce a leaf into its parent, so
> >> after you've created one snapshot you'll always have a chain length of 2
> >> even if you delete the snapshot (if you create a second snapshot it can be
> >> coalesced).
> >>     
> >>>>> Coalescing a leaf into its parent is on the todo list: its a little
> >>>>>           
> >> bit different from the other cases because it requires synchronization if
> >> the leaf is in active use. It's not a big deal from a performance point of
> >> view to have the slightly longer chain length, but it will be good to get
> >> this fixed for cleanliness.
> >>     
> >>>>>> 3. a clone will introduce a writable snapshot, it will prevent
> >>>>>>             
> >> coalesce
> >>     
> >>>>> A clone will produce a new writeable leaf linked to the parent.  It
> >>>>>           
> >> will prevent the linked snapshot from being coalesced, but any other
> >> snapshots above or below on the chain can still be coalesced by the
> >> garbage collector if the snapshots are deleted.
> >>     
> >>>>> The XCP storage management stuff is pretty cool IMO...
> >>>>>
> >>>>> Ian
> >>>>>
> >>>>>           
> >>>>>> - Anthony
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, 2010-01-26 at 02:34 -0800, Julian Chesterfield wrote:
> >>>>>>             
> >>>>>>> Hi Anthony,
> >>>>>>>
> >>>>>>> Anthony Xu wrote: > Hi all, > > Basically snapshot on LVMoISCSI
> >>>>>>>               
> >> SR work
> >>     
> >>>>>>>  well, it provides thin > provisioning, so it is fast and disk
> >>>>>>>               
> >> space
> >>     
> >>>>>>>  efficient. > > > But I still have below concern. > > There is
> >>>>>>>               
> >> one more
> >>     
> >>>>>>>  vhd chain when creating snapshot, if I creates 16 > snapshots,
> >>>>>>>               
> >> there
> >>     
> >>>>>>>  are 16 vhd chains, that means when one VM accesses a > disk
> >>>>>>>               
> >> block, it
> >>     
> >>>>>>>  may need to access 16 vhd lvm one by one, then get the > right
> >>>>>>>               
> >> block,
> >>     
> >>>>>>>  it makes VM access disk slow. However, it is > understandable,
> >>>>>>>               
> >> it is
> >>     
> >>>>>>>  part of snapshot IMO. >   The depth and speed of access will
> >>>>>>>               
> >> depend on
> >>     
> >>>>>>>  the write pattern to the disk. In XCP we add an optimisation
> >>>>>>>               
> >> called a
> >>     
> >>>>>>>  BATmap which stores one bit per BAT entry. This is a fast
> >>>>>>>               
> >> lookup table
> >>     
> >>>>>>>  that is cached in memory while the VHD is open, and tells the
> >>>>>>>               
> >> block
> >>     
> >>>>>>>  device handler whether a block has been fully allocated. Once
> >>>>>>>               
> >> the
> >>     
> >>>>>>>  block is fully allocated (all logical 2MB written) the block
> >>>>>>>               
> >> handler
> >>     
> >>>>>>>  knows that it doesn't need to read or write the Bitmap that
> >>>>>>>  corresponds to the data block, it can go directly to the disk
> >>>>>>>               
> >> offset.
> >>     
> >>>>>>>  Scanning through the VHD chain can therefore be very quick,
> >>>>>>>               
> >> i.e. the
> >>     
> >>>>>>>  block handler reads down the chain of BAT tables for each node
> >>>>>>>               
> >> until
> >>     
> >>>>>>>  it detects a node that is allocated with hopefully the BATmap
> >>>>>>>               
> >> value
> >>     
> >>>>>>>  set. The worst case is a random disk write workload which
> >>>>>>>               
> >> causes the
> >>     
> >>>>>>>  disk to be fragmented and partially allocated. Every read or
> >>>>>>>               
> >> write
> >>     
> >>>>>>>  will therefore potentially incur a bitmap check at every level
> >>>>>>>               
> >> of the
> >>     
> >>>>>>>  chain. > But after I delete all these 16 snapshots, there is
> >>>>>>>               
> >> still 16
> >>     
> >>>>>>>  vhd chains, > the disk access is still slow, which is not
> >>>>>>>  understandable and > reasonable, even though there may be only
> >>>>>>>               
> >> several
> >>     
> >>>>>>>  KB difference between > each snapshot, >   There is a mechanism
> >>>>>>>               
> >> in XCP
> >>     
> >>>>>>>  called the GC coalesce thread which gets kicked asynchronously
> >>>>>>>  following a VDI deletion event. It queries the VHD tree, and
> >>>>>>>  determines whether there is any coalescable work to do.
> >>>>>>>               
> >> Coalesceable
> >>     
> >>>>>>>  work is defined as:
> >>>>>>>
> >>>>>>> 'a hidden child node that has no siblings'
> >>>>>>>
> >>>>>>> Hidden nodes are non-leaf nodes that reside within a chain. When
> >>>>>>>               
> >> the
> >>     
> >>>>>>> snapshot leaf node is deleted therefore, it will leave redundant
> >>>>>>>               
> >> links
> >>     
> >>>>>>> in the chain that can be safely coalesced. You can kick off a
> >>>>>>>               
> >> coalesce
> >>     
> >>>>>>> by issuing an SR scan, although it should kick off automatically
> >>>>>>>               
> >> within
> >>     
> >>>>>>> 30 seconds of deleting the snapshot node, handled by XAPI. If
> >>>>>>>               
> >> you look
> >>     
> >>>>>>> in the /var/log/SMlog file you'll see a lot of debug information
> >>>>>>> including tree dependencies which will tell you a) whether the
> >>>>>>>               
> >> GC thread
> >>     
> >>>>>>> is running, and b) whether there is coalescable work to do. Note
> >>>>>>>               
> >> that
> >>     
> >>>>>>> deleting snapshot nodes does not always mean that there is
> >>>>>>>               
> >> coalescable
> >>     
> >>>>>>> work to do since there may be other siblings, e.g. VDI clones.
> >>>>>>>               
> >>>>>>>> is there any way we can reduce depth of vhd chain after
> >>>>>>>>                 
> >> deleting
> >>     
> >>>>>>>> snapshots? get VM back to normal disk performance.
> >>>>>>>>
> >>>>>>>>                 
> >>>>>>> The coalesce thread handles this, see above.
> >>>>>>>               
> >>>>>>>> And, I notice there are useless vhd volume exist after
> >>>>>>>>                 
> >> deleting snap
> >>     
> >>>>>>>> shots, can we delete them automatically?
> >>>>>>>>
> >>>>>>>>                 
> >>>>>>> No. I do not recommend deleting VHDs manually since they are
> >>>>>>>               
> >> almost
> >>     
> >>>>>>> certainly referenced by something else in the chain. If you
> >>>>>>>               
> >> delete them
> >>     
> >>>>>>> manually you will break the chain, it will become unreadable,
> >>>>>>>               
> >> and you
> >>     
> >>>>>>> potentially lose critical data. VHD chains must be correctly
> >>>>>>>               
> >> coalesced
> >>     
> >>>>>>> in order to maintain data integrity.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Julian
> >>>>>>>               
> >>>>>>>> - Anthony
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> xen-api mailing list
> >>>>>>>> xen-api@xxxxxxxxxxxxxxxxxxx
> >>>>>>>> http://lists.xensource.com/mailman/listinfo/xen-api
> >>>>>>>>
> >>>>>>>>                 
> >>>>>> _______________________________________________
> >>>>>> xen-api mailing list
> >>>>>> xen-api@xxxxxxxxxxxxxxxxxxx
> >>>>>> http://lists.xensource.com/mailman/listinfo/xen-api
> >>>>>>             
> >>>> _______________________________________________
> >>>> xen-api mailing list
> >>>> xen-api@xxxxxxxxxxxxxxxxxxx
> >>>> http://lists.xensource.com/mailman/listinfo/xen-api
> >>>>         
> >>>       
> >
> >   
> 


_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.