On Mon, Jan 18, 2010 at 10:36:29AM +0000, Julian Chesterfield wrote:
> Pasi Kärkkäinen wrote:
> >On Fri, Jan 15, 2010 at 02:06:30PM +0200, Pasi Kärkkäinen wrote:
> >
> >>On Fri, Jan 15, 2010 at 11:56:46AM +0000, Julian Chesterfield wrote:
>
>>
> >>>Dave Scott wrote:
> >>>
> >>>>Hi Pasi,
> >>>>
> >>>>[cc:d Julian who is responsible for storage in XCP]
> >>>>
> >>>>
> >>>>
> >>>>>I haven't looked at the XCP code yet, but are there some special
> >>>>>patches for
> >>>>>LVM to make it work in a shared environment on multiple hosts?
> >>>>>
> >>>>>I guess it's not CLVM, since you support snapshots.. so xapi is doing
> >>>>>some coordination of management commands and making sure only one LVM
> >>>>>command is issued at a time?
> >>>>>
> >>>>>
>
>>>>Julian could describe the detail better than me but my high-level
> >>>>understanding is:
> >>>>
> >>>>* xapi nominates one host to be the 'SR master': all LVM
> >>>>metadata-changing commands are run here
> >>>>
> >>>>* all hosts are allowed to map/unmap LVs so the LVM commands were
> >>>>patched to make absolutely sure they didn't attempt to change any
> >>>>metadata
> >>>>
> >>>>* unless you request a special "raw" LV, vhd metadata is added to the
> >>>>LV: this is how we handle snapshots
> >>>>
> >>>>
> >>>Yep, this is correct. We use XAPI as the "Cluster lock manager"
> >>>essentially. There is a strict notion of ordering of events, and
XAPI
> >>>always ensures that there is a single SRMaster for any shared SR. The SR
> >>>master is the only entity that modifies LVM metadata, and it
> >>>strategically refreshes slaves as necessary. Typically slaves only
> >>>operate in an LVM Read-only mode, so the LVM metadata is refreshed when
> >>>a slave needs to access a new logical volume, and the slave is only
> >>>allowed to create device-mapper nodes, never to modify the LVM metadata.
> >>>There are patches to LVM to add an explicit 'master' flag, this ensures
> >>>that non-masters never attempt to repair LVM metadata if ever it is read
> >>>and found to be inconsistent. In practice this would never happen due to
> >>>the way LVM updates its metadata and the fact that we do not allow
> >>>shared Volume Groups that span
more than one LUN, however it's an
> >>>important safety catch.
> >>>
> >>>
> >>Thank you both for answers! This is good explanation of how it works.
> >>
> >>
> >>>Snapshot and clone support is provided via the VHD layer that resides
> >>>above raw Logical Volumes. i.e. we create VHD Copy-on-write instances in
> >>>the same way as the file-based VHD support (e.g. NFS or local Ext3
> >>>partitions).
> >>>
> >>>
> >>Oh, so XenServer/XCP doesn't use LVM snapshots at all? That's good to
> >>know.
> >>
> >>
> >
> >So where/how does it store the deltas between the original volume and
> >the snapshot?
> >
> A VHD format
disk is created on top of the raw Logical Volume. VHD has a
> metadata format that provides sparseness and also allows chains of
> dependencies to be created as differencing Copy-on-write disks via the
> parent locator fields.
>
Ok.
> >
> >>Is there some commandline tool to control the VHD snapshots?
> >>
> >>
> >
> >Other than 'xe', of course.
> >
> Yes. There is a vhd-util commandline tool that allows you to query VHD
> headers, generate new VHD images, create CoW VHD dependency images etc...
>
Great. I have to try these some day.
Thanks!
-- Pasi
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxhttp://lists.xensource.com/xen-users