Xen project Mailing List

Re: [Xen-API] How snapshot work on LVMoISCS SR

To: Julian Chesterfield <julian.chesterfield@xxxxxxxxxxxxx>

Date: Wed, 27 Jan 2010 12:19:34 -0800

Cc: Ian Pratt <Ian.Pratt@xxxxxxxxxxxxx>, "xen-api@xxxxxxxxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxxxxxxxx>, Dave Scott <Dave.Scott@xxxxxxxxxxxxx>, Daniel Stodden <Daniel.Stodden@xxxxxxxxxx>

Delivery-date: Wed, 27 Jan 2010 12:19:33 -0800

List-id: Discussion of API issues surrounding Xen <xen-api.lists.xensource.com>

Thanks Julian, it is more clear for me how snapshot works. - Anthony On Wed, 2010-01-27 at 02:37 -0800, Julian Chesterfield wrote: > Ian Pratt wrote: > >> That means if guest linux is executing "yum install kernel" when > >> creating snapshot, the vm created from this snapshot might be not > >> bootable. > >> > > > > Because xen issues write completions to the guest only when IO is > > completed, the snapshot will at least be crash consistent from a filesystem > > point of view (just like a physical system loosing power). > > > > Linux doesn't have a generic mechanism for doing higher-level 'freeze' > > operations (see Windows VSS) so there's no way to notify yum that we'd like > > to take a snapshot. Some linux filesystems do support a freeze operation, > > but it's not clear this buys a great deal. > > > Ack. Without application signalling (as provided by VSS) it's unclear > whether there's any real benefit since the application data may still be > internally inconsistent. > > FYI - for windows VMs XCP includes a VSS quiesced snapshot option > (VM.snapshot_with_quiesce) which utilises the agent running in the guest > as a VSS requestor to quiesce the apps, flush the local cache to disk > and then trigger a snapshot for all the VMs disks. > > - Julian > > 99 times out of 100 you'll get away with just taking a snapshot of a VM. If > > you're wanting to use the snapshot as a template for creating other clones > > you'd be best advised to shut the guest down and get a clean filesystem > > though. Any snapshot should be fine for general file backup purposes. > > > > Ian > > > > PS: I'd be surprised if "yum install kernel" didn't actually go to some > > lengths to be reasonably atomic as regards switching grub over to using the > > new kernel, otherwise you'd have the same problem on a physical machine > > crashing or losing power. > > > > > >> - Anthony > >> > >> > >> > >> > >> > >>> Daniel > >>> > >>> > >>>> How does XCP make sure this snapshot is usable,say, virtual disk > >>>> metadata is consistent? > >>>> > >>>> Thanks > >>>> - Anthony > >>>> > >>>> > >>>> On Tue, 2010-01-26 at 13:56 -0800, Ian Pratt wrote: > >>>> > >>>>>> I still have below questions. > >>>>>> > >>>>>> 1. if a non-leaf node is coalesce-able, it will be coalesced later > >>>>>> > >> on > >> > >>>>>> regardless how big the physical size of this node? > >>>>>> > >>>>> Yes: it's always good to coalesce the chain to improve access > >>>>> > >> performance. > >> > >>>>>> 2. there is one leaf node for a snapshot, actually it may be > >>>>>> > >> empty, does > >> > >>>>>> it exist only because it can prevent coalesce. > >>>>>> > >>>>> Not quite sure what you're referring to here. The current code has a > >>>>> > >> limitation whereby it is unable to coalesce a leaf into its parent, so > >> after you've created one snapshot you'll always have a chain length of 2 > >> even if you delete the snapshot (if you create a second snapshot it can be > >> coalesced). > >> > >>>>> Coalescing a leaf into its parent is on the todo list: its a little > >>>>> > >> bit different from the other cases because it requires synchronization if > >> the leaf is in active use. It's not a big deal from a performance point of > >> view to have the slightly longer chain length, but it will be good to get > >> this fixed for cleanliness. > >> > >>>>>> 3. a clone will introduce a writable snapshot, it will prevent > >>>>>> > >> coalesce > >> > >>>>> A clone will produce a new writeable leaf linked to the parent. It > >>>>> > >> will prevent the linked snapshot from being coalesced, but any other > >> snapshots above or below on the chain can still be coalesced by the > >> garbage collector if the snapshots are deleted. > >> > >>>>> The XCP storage management stuff is pretty cool IMO... > >>>>> > >>>>> Ian > >>>>> > >>>>> > >>>>>> - Anthony > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, 2010-01-26 at 02:34 -0800, Julian Chesterfield wrote: > >>>>>> > >>>>>>> Hi Anthony, > >>>>>>> > >>>>>>> Anthony Xu wrote: > Hi all, > > Basically snapshot on LVMoISCSI > >>>>>>> > >> SR work > >> > >>>>>>> well, it provides thin > provisioning, so it is fast and disk > >>>>>>> > >> space > >> > >>>>>>> efficient. > > > But I still have below concern. > > There is > >>>>>>> > >> one more > >> > >>>>>>> vhd chain when creating snapshot, if I creates 16 > snapshots, > >>>>>>> > >> there > >> > >>>>>>> are 16 vhd chains, that means when one VM accesses a > disk > >>>>>>> > >> block, it > >> > >>>>>>> may need to access 16 vhd lvm one by one, then get the > right > >>>>>>> > >> block, > >> > >>>>>>> it makes VM access disk slow. However, it is > understandable, > >>>>>>> > >> it is > >> > >>>>>>> part of snapshot IMO. > The depth and speed of access will > >>>>>>> > >> depend on > >> > >>>>>>> the write pattern to the disk. In XCP we add an optimisation > >>>>>>> > >> called a > >> > >>>>>>> BATmap which stores one bit per BAT entry. This is a fast > >>>>>>> > >> lookup table > >> > >>>>>>> that is cached in memory while the VHD is open, and tells the > >>>>>>> > >> block > >> > >>>>>>> device handler whether a block has been fully allocated. Once > >>>>>>> > >> the > >> > >>>>>>> block is fully allocated (all logical 2MB written) the block > >>>>>>> > >> handler > >> > >>>>>>> knows that it doesn't need to read or write the Bitmap that > >>>>>>> corresponds to the data block, it can go directly to the disk > >>>>>>> > >> offset. > >> > >>>>>>> Scanning through the VHD chain can therefore be very quick, > >>>>>>> > >> i.e. the > >> > >>>>>>> block handler reads down the chain of BAT tables for each node > >>>>>>> > >> until > >> > >>>>>>> it detects a node that is allocated with hopefully the BATmap > >>>>>>> > >> value > >> > >>>>>>> set. The worst case is a random disk write workload which > >>>>>>> > >> causes the > >> > >>>>>>> disk to be fragmented and partially allocated. Every read or > >>>>>>> > >> write > >> > >>>>>>> will therefore potentially incur a bitmap check at every level > >>>>>>> > >> of the > >> > >>>>>>> chain. > But after I delete all these 16 snapshots, there is > >>>>>>> > >> still 16 > >> > >>>>>>> vhd chains, > the disk access is still slow, which is not > >>>>>>> understandable and > reasonable, even though there may be only > >>>>>>> > >> several > >> > >>>>>>> KB difference between > each snapshot, > There is a mechanism > >>>>>>> > >> in XCP > >> > >>>>>>> called the GC coalesce thread which gets kicked asynchronously > >>>>>>> following a VDI deletion event. It queries the VHD tree, and > >>>>>>> determines whether there is any coalescable work to do. > >>>>>>> > >> Coalesceable > >> > >>>>>>> work is defined as: > >>>>>>> > >>>>>>> 'a hidden child node that has no siblings' > >>>>>>> > >>>>>>> Hidden nodes are non-leaf nodes that reside within a chain. When > >>>>>>> > >> the > >> > >>>>>>> snapshot leaf node is deleted therefore, it will leave redundant > >>>>>>> > >> links > >> > >>>>>>> in the chain that can be safely coalesced. You can kick off a > >>>>>>> > >> coalesce > >> > >>>>>>> by issuing an SR scan, although it should kick off automatically > >>>>>>> > >> within > >> > >>>>>>> 30 seconds of deleting the snapshot node, handled by XAPI. If > >>>>>>> > >> you look > >> > >>>>>>> in the /var/log/SMlog file you'll see a lot of debug information > >>>>>>> including tree dependencies which will tell you a) whether the > >>>>>>> > >> GC thread > >> > >>>>>>> is running, and b) whether there is coalescable work to do. Note > >>>>>>> > >> that > >> > >>>>>>> deleting snapshot nodes does not always mean that there is > >>>>>>> > >> coalescable > >> > >>>>>>> work to do since there may be other siblings, e.g. VDI clones. > >>>>>>> > >>>>>>>> is there any way we can reduce depth of vhd chain after > >>>>>>>> > >> deleting > >> > >>>>>>>> snapshots? get VM back to normal disk performance. > >>>>>>>> > >>>>>>>> > >>>>>>> The coalesce thread handles this, see above. > >>>>>>> > >>>>>>>> And, I notice there are useless vhd volume exist after > >>>>>>>> > >> deleting snap > >> > >>>>>>>> shots, can we delete them automatically? > >>>>>>>> > >>>>>>>> > >>>>>>> No. I do not recommend deleting VHDs manually since they are > >>>>>>> > >> almost > >> > >>>>>>> certainly referenced by something else in the chain. If you > >>>>>>> > >> delete them > >> > >>>>>>> manually you will break the chain, it will become unreadable, > >>>>>>> > >> and you > >> > >>>>>>> potentially lose critical data. VHD chains must be correctly > >>>>>>> > >> coalesced > >> > >>>>>>> in order to maintain data integrity. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Julian > >>>>>>> > >>>>>>>> - Anthony > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> xen-api mailing list > >>>>>>>> xen-api@xxxxxxxxxxxxxxxxxxx > >>>>>>>> http://lists.xensource.com/mailman/listinfo/xen-api > >>>>>>>> > >>>>>>>> > >>>>>> _______________________________________________ > >>>>>> xen-api mailing list > >>>>>> xen-api@xxxxxxxxxxxxxxxxxxx > >>>>>> http://lists.xensource.com/mailman/listinfo/xen-api > >>>>>> > >>>> _______________________________________________ > >>>> xen-api mailing list > >>>> xen-api@xxxxxxxxxxxxxxxxxxx > >>>> http://lists.xensource.com/mailman/listinfo/xen-api > >>>> > >>> > > > > > _______________________________________________ xen-api mailing list xen-api@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/mailman/listinfo/xen-api

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.