[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC V9 2/4] domain snapshot overview



On Tue, 2014-12-16 at 14:32 +0800, Chunyan Liu wrote:
> Changes to V8:
>   * add an overview document, so that one can has a overall look
>     about the whole domain snapshot work, limits, requirements,
>     how to do, etc.
> 
> =====================================================================
> Domain snapshot overview

I don't see a similar section for disk snapshots, are you not
considering those here except as a part of a domain snapshot or is this
an oversight?

There are three main use cases (that I know of at least) for
snapshotting like behaviour.

One is as you've mentioned below for "backup", i.e. to preserve the VM
at a certain point in time in order to be able to roll back to it. Is
this the only usecase you are considering?

A second use case is to support "gold image" type deployments, i.e.
where you create one baseline single disk image and then clone it
multiple times to deploy lots of guests. I think this is usually a "disk
snapshot" type thing, but maybe it can be implemented as restoring a
gold domain snapshot multiple times (e.g. for start of day performance
reasons).

The third case, (which is similar to the first), is taking a disk
snapshot in order to be able to run you usual backup software on the
snapshot (which is now unchanging, which is handy) and then deleting the
disk snapshot (this differs from the first case in which disk is active
after the snapshot, and due to the lack of the memory part). 

Are you considering all three use cases here or are you explicitly
ruling out anything but the first? I think there might be some subtle
differences in the requirements, wrt which operations need to consider
the possibility of an active domain etc, depending on which cases are
considered. It would be good to be explicit about the use cases you are
not trying to address here so we are all on the same page.

If you are ruling these other usecases out then I think it would be
useful to briefly describe them and then note that they are out of scope
for this design, so that we have an agreed understanding of what is in
or out of scope and/or can debate to what extent such use cases ought to
be considered in the design if not the implementation.

> 1. Purpose
> 
> Domain snapshot is a system checkpoint of a domain. Later, one can
> roll back the domain to that checkpoint. It's a very useful backup
> function. A domain snapshot contains the memory status at the
> checkpoint and the disk status (which we called disk snapshot).


> Domain snapshot functionality usually includes:
> a) create a domain snapshot
> b) roll back (or called "revert") to a domain snapshot
> c) delete a domain snapshot
> d) list all domain snapshots
> 
> But following the existing xl idioms of managing storage and saved
> VM images via existing CLI command (qemu-img, lvcreate, ls, mv,
> cp etc), xl snapshot functionality would be kept as simple as
> possible:
> * xl will do a) and b), creating a snapshot and reverting a
>   domain to a snapshot.
> * xl will NOT do c) and d), xl won't manage snapshots, as xl
>   doesn't maintain saved images created by 'xl save'. So xl
>   will have no idea of the existence of domain snapshots and
>   the chain relationship between snapshots. It will depends on
>   user to take care of the snapshots, know the snapshot chain
>   info, and delete snapshots.

This is a case where the usecases being considered might apply. If the
third case I outlined above is in scope then xl may need to somehow
support deleting a snapshot from under the feet of an active domain etc
(which need not necessarily imply knowledge of snapshot chains or
snapshot management, but might involve a notification to the backend for
example).

> Domain Snapshot Support and Not Support:
> * support live snapshot
> * support internal disk snapshot and external disk snapshot
> * support different disk backend types.
>   (Basic goal is to support 'raw' and 'qcow2' only).
> 
> * not support snapshot when domain is shutdowning or dying.
> * not support disk-only snapshot [1].
> 
>  [1] To xl, it only concerns active domains, and even when domain
>  is paused, there is no data flush to disk operation. So, take
>  a disk-only snapshot and then resume, it is as if the guest
>  had crashed. For this reason, disk-only snapshot is meaningless
>  to xl. Should not support.
> 
> 
> 2. Requirements
> 
> General Requirements:
> * ability to save/restore domain memory
> * ability to create/delete/apply disk snapshot [2]

Is "apply" the same as "revert to"? Worth adding to the terminology
section and using consistently.

> * ability to parse user config file
> 
>   [2] Disk snapshot requirements:
>   - external tools: qemu-img, lvcreate, vhd-util, etc.
>   - for basic goal, we support 'raw' and 'qcow2' backend types
>     only. Then it requires:
>     libxl qmp command or "qemu-img" (when qemu process does not
>     exist)
> 
> 
> 3. Interaction with other operations:
> 
> No.

What about shutdown/dying as you noted above? What about migration or
regular save/restore?

> 
> 4. General workflow
> 
> Create a snapshot:
>   * parse user cfg file if passed in
>   * check snapshot operation is allowed or not
>   * save domain, saving memory status to file (refer to: save_domain)
>   * take disk snapshot (e.g. call qmp command)
>   * unpause domain
> 
> Revert to snapshot:
>   * parse use cfg file (xl doesn't manage snapshots, so it has no
>     idea of snapshot existence. User MUST supply configuration file)
>   * destroy this domain
>   * create a new domain from snapshot info
>     - apply disk snapshot (e.g. call qemu-img)
>     - a process like restore domain



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.