[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] How snapshot work on LVMoISCS SR



Hi Julian/Dave,

Thanks for your detailed explanation,

I still have below questions.

1. if a non-leaf node is coalesce-able, it will be coalesced later on
regardless how big the physical size of this node?

2. there is one leaf node for a snapshot, actually it may be empty, does
it exist only because it can prevent coalesce.

3. a clone will introduce a writable snapshot, it will prevent coalesce


- Anthony



On Tue, 2010-01-26 at 02:34 -0800, Julian Chesterfield wrote:
> Hi Anthony,
> 
> Anthony Xu wrote: > Hi all, > > Basically snapshot on LVMoISCSI SR work
>  well, it provides thin > provisioning, so it is fast and disk space
>  efficient. > > > But I still have below concern. > > There is one more
>  vhd chain when creating snapshot, if I creates 16 > snapshots, there
>  are 16 vhd chains, that means when one VM accesses a > disk block, it
>  may need to access 16 vhd lvm one by one, then get the > right block,
>  it makes VM access disk slow. However, it is > understandable, it is
>  part of snapshot IMO. >   The depth and speed of access will depend on
>  the write pattern to the disk. In XCP we add an optimisation called a
>  BATmap which stores one bit per BAT entry. This is a fast lookup table
>  that is cached in memory while the VHD is open, and tells the block
>  device handler whether a block has been fully allocated. Once the
>  block is fully allocated (all logical 2MB written) the block handler
>  knows that it doesn't need to read or write the Bitmap that
>  corresponds to the data block, it can go directly to the disk offset.
>  Scanning through the VHD chain can therefore be very quick, i.e. the
>  block handler reads down the chain of BAT tables for each node until
>  it detects a node that is allocated with hopefully the BATmap value
>  set. The worst case is a random disk write workload which causes the
>  disk to be fragmented and partially allocated. Every read or write
>  will therefore potentially incur a bitmap check at every level of the
>  chain. > But after I delete all these 16 snapshots, there is still 16
>  vhd chains, > the disk access is still slow, which is not
>  understandable and > reasonable, even though there may be only several
>  KB difference between > each snapshot, >   There is a mechanism in XCP
>  called the GC coalesce thread which gets kicked asynchronously
>  following a VDI deletion event. It queries the VHD tree, and
>  determines whether there is any coalescable work to do. Coalesceable
>  work is defined as:
> 
> 'a hidden child node that has no siblings'
> 
> Hidden nodes are non-leaf nodes that reside within a chain. When the 
> snapshot leaf node is deleted therefore, it will leave redundant links 
> in the chain that can be safely coalesced. You can kick off a coalesce 
> by issuing an SR scan, although it should kick off automatically within 
> 30 seconds of deleting the snapshot node, handled by XAPI. If you look 
> in the /var/log/SMlog file you'll see a lot of debug information 
> including tree dependencies which will tell you a) whether the GC thread 
> is running, and b) whether there is coalescable work to do. Note that 
> deleting snapshot nodes does not always mean that there is coalescable 
> work to do since there may be other siblings, e.g. VDI clones.
> > is there any way we can reduce depth of vhd chain after deleting
> > snapshots? get VM back to normal disk performance.
> >   
> The coalesce thread handles this, see above.
> > And, I notice there are useless vhd volume exist after deleting snap
> > shots, can we delete them automatically?
> >   
> No. I do not recommend deleting VHDs manually since they are almost 
> certainly referenced by something else in the chain. If you delete them 
> manually you will break the chain, it will become unreadable, and you 
> potentially lose critical data. VHD chains must be correctly coalesced 
> in order to maintain data integrity.
> 
> Thanks,
> Julian
> >
> > - Anthony
> >
> >
> >
> >
> > _______________________________________________
> > xen-api mailing list
> > xen-api@xxxxxxxxxxxxxxxxxxx
> > http://lists.xensource.com/mailman/listinfo/xen-api
> >   
> 


_______________________________________________
xen-api mailing list
xen-api@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/mailman/listinfo/xen-api


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.