[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-API] How snapshot work on LVMoISCS SR



On Tue, 2010-01-26 at 14:18 -0800, Daniel Stodden wrote:
> On Tue, 2010-01-26 at 17:07 -0500, Anthony Xu wrote:
> > It is clear now, thanks.
> > 
> > The other thing I'd like to do is how XCP handle disk cache inside VM
> > when creating a snapshot? I saw from Xencenter seem the VM is stopped
> > temporarily when creating a snapshot.
> > 
> > Does VM flush dirty disk cache when creating snapshot?
> 
> Depends what you mean by disk caches. All I/O performed by the backend
> non-buffered, so there's presently no need to flush. As soon as a guest
> I/O request is processed, it essentially goes directly to the disk.
> 
> The snapshot is created while the VBD is paused, i.e. guest accesses
> which haven't been issued to the disk are suspended. Next, request which
> have been sent to the disk are waited for, up to completion. Then blktap
> closes the handle to the physical disk node.
> 
> Before resuming guest access, we then reopen the newly created snapshot
> node, as the new leaf node.



That means if guest linux is executing "yum install kernel" when
creating snapshot, the vm created from this snapshot might be not
bootable.


- Anthony




> 
> Daniel
> 
> > How does XCP make sure this snapshot is usable,say, virtual disk
> > metadata is consistent?
> > 
> > Thanks
> > - Anthony
> > 
> > 
> > On Tue, 2010-01-26 at 13:56 -0800, Ian Pratt wrote:
> > > > I still have below questions.
> > > > 
> > > > 1. if a non-leaf node is coalesce-able, it will be coalesced later on
> > > > regardless how big the physical size of this node?
> > > 
> > > Yes: it's always good to coalesce the chain to improve access performance.
> > >  
> > > > 2. there is one leaf node for a snapshot, actually it may be empty, does
> > > > it exist only because it can prevent coalesce.
> > > 
> > > Not quite sure what you're referring to here. The current code has a 
> > > limitation whereby it is unable to coalesce a leaf into its parent, so 
> > > after you've created one snapshot you'll always have a chain length of 2 
> > > even if you delete the snapshot (if you create a second snapshot it can 
> > > be coalesced). 
> > > 
> > > Coalescing a leaf into its parent is on the todo list: its a little bit 
> > > different from the other cases because it requires synchronization if the 
> > > leaf is in active use. It's not a big deal from a performance point of 
> > > view to have the slightly longer chain length, but it will be good to get 
> > > this fixed for cleanliness.  
> > > 
> > > > 3. a clone will introduce a writable snapshot, it will prevent coalesce
> > > 
> > > A clone will produce a new writeable leaf linked to the parent.  It will 
> > > prevent the linked snapshot from being coalesced, but any other snapshots 
> > > above or below on the chain can still be coalesced by the garbage 
> > > collector if the snapshots are deleted. 
> > > 
> > > The XCP storage management stuff is pretty cool IMO...
> > > 
> > > Ian
> > > 
> > > > 
> > > > - Anthony
> > > > 
> > > > 
> > > > 
> > > > On Tue, 2010-01-26 at 02:34 -0800, Julian Chesterfield wrote:
> > > > > Hi Anthony,
> > > > >
> > > > > Anthony Xu wrote: > Hi all, > > Basically snapshot on LVMoISCSI SR 
> > > > > work
> > > > >  well, it provides thin > provisioning, so it is fast and disk space
> > > > >  efficient. > > > But I still have below concern. > > There is one 
> > > > > more
> > > > >  vhd chain when creating snapshot, if I creates 16 > snapshots, there
> > > > >  are 16 vhd chains, that means when one VM accesses a > disk block, it
> > > > >  may need to access 16 vhd lvm one by one, then get the > right block,
> > > > >  it makes VM access disk slow. However, it is > understandable, it is
> > > > >  part of snapshot IMO. >   The depth and speed of access will depend 
> > > > > on
> > > > >  the write pattern to the disk. In XCP we add an optimisation called a
> > > > >  BATmap which stores one bit per BAT entry. This is a fast lookup 
> > > > > table
> > > > >  that is cached in memory while the VHD is open, and tells the block
> > > > >  device handler whether a block has been fully allocated. Once the
> > > > >  block is fully allocated (all logical 2MB written) the block handler
> > > > >  knows that it doesn't need to read or write the Bitmap that
> > > > >  corresponds to the data block, it can go directly to the disk offset.
> > > > >  Scanning through the VHD chain can therefore be very quick, i.e. the
> > > > >  block handler reads down the chain of BAT tables for each node until
> > > > >  it detects a node that is allocated with hopefully the BATmap value
> > > > >  set. The worst case is a random disk write workload which causes the
> > > > >  disk to be fragmented and partially allocated. Every read or write
> > > > >  will therefore potentially incur a bitmap check at every level of the
> > > > >  chain. > But after I delete all these 16 snapshots, there is still 16
> > > > >  vhd chains, > the disk access is still slow, which is not
> > > > >  understandable and > reasonable, even though there may be only 
> > > > > several
> > > > >  KB difference between > each snapshot, >   There is a mechanism in 
> > > > > XCP
> > > > >  called the GC coalesce thread which gets kicked asynchronously
> > > > >  following a VDI deletion event. It queries the VHD tree, and
> > > > >  determines whether there is any coalescable work to do. Coalesceable
> > > > >  work is defined as:
> > > > >
> > > > > 'a hidden child node that has no siblings'
> > > > >
> > > > > Hidden nodes are non-leaf nodes that reside within a chain. When the
> > > > > snapshot leaf node is deleted therefore, it will leave redundant links
> > > > > in the chain that can be safely coalesced. You can kick off a coalesce
> > > > > by issuing an SR scan, although it should kick off automatically 
> > > > > within
> > > > > 30 seconds of deleting the snapshot node, handled by XAPI. If you look
> > > > > in the /var/log/SMlog file you'll see a lot of debug information
> > > > > including tree dependencies which will tell you a) whether the GC 
> > > > > thread
> > > > > is running, and b) whether there is coalescable work to do. Note that
> > > > > deleting snapshot nodes does not always mean that there is coalescable
> > > > > work to do since there may be other siblings, e.g. VDI clones.
> > > > > > is there any way we can reduce depth of vhd chain after deleting
> > > > > > snapshots? get VM back to normal disk performance.
> > > > > >
> > > > > The coalesce thread handles this, see above.
> > > > > > And, I notice there are useless vhd volume exist after deleting snap
> > > > > > shots, can we delete them automatically?
> > > > > >
> > > > > No. I do not recommend deleting VHDs manually since they are almost
> > > > > certainly referenced by something else in the chain. If you delete 
> > > > > them
> > > > > manually you will break the chain, it will become unreadable, and you
> > > > > potentially lose critical data. VHD chains must be correctly coalesced
> > > > > in order to maintain data integrity.
> > > > >
> > > > > Thanks,
> > > > > Julian
> > > > > >
> > > > > > - Anthony
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > xen-api mailing list
> > > > > > xen-api@xxxxxxxxxxxxxxxxxxx
> > > > > > http://lists.xensource.com/mailman/listinfo/xen-api
> > > > > >
> > > > >
> > > > 
> > > > 
> > > > _______________________________________________
> > > > xen-api mailing list
> > > > xen-api@xxxxxxxxxxxxxxxxxxx
> > > > http://lists.xensource.com/mailman/listinfo/xen-api
> > 
> > 
> > _______________________________________________
> > xen-api mailing list
> > xen-api@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/mailman/listinfo/xen-api
> 
> 


_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.