[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC V8 2/3] libxl domain snapshot API design



Hi, Ian,

According to previous discussion, snapshot delete and revert are
inclined to be done by high level application itself, won't supply a
libxl API. I'm wondering snapshot create need a new common API?
In fact its main work is save domain and take disk snapshot, xl can
do it too. 

I just write down an overview of the snapshot work (see below).
The problem is: do we need to export API? What kind of API?
In updating Bamvor's code, I think xl can do all the work, libvirt can
do the work too even without libxl's help.

Of course, there are some thing if put in libxl, it will be easier to
use, like the domain snapshot info structure, gentype.py will
directly generate useful init/dispose/to_json/from_json functions.
Or the disk snapshot part can be extracted and placed in libxl or libxlu.

Any suggestions about which part is better to be extracted as libxl
API or better not?

Thanks,
Chunyan

------------------------------------------------------------------------------------------------------
libxl domain snapshot overview

0. Glossary
* Active domain: domain created and started
* Inactive domain: domain created but not started
* Domain snapshot:
  Domain snapshot is a system checkpoint of a domain. It contains
  the memory status at the checkpoint and the disk status.
* Disk-only snapshot:
  Disk-only snapshot only keeps the status of disk, not saving
  memory status. It's a special kind of domain snapshot. It's valid
  when domain is inactive, or domain is paused and all cached data
  has been flushed to disk. Otherwise, disk-only snapshot is a
  useless inconsistent state.

1. Purpose

Domain snapshot is a system checkpoint of a domain. Later, one can
roll back the domain to that checkpoint. It's a very useful backup
function. A domain snapshot contains the memory status at the
checkpoint and the disk status (which we called disk snapshot).

Domain snapshot functionality should include:
* create a domain snapshot
* roll back (or called "revert") to a domain snapshot
* delete a domain snapshot
* list all domain snapshots

Domain Snapshot Support and Not Support:
* support live snapshot
* support internal disk snapshot and external disk snapshot
* support different disk backend types.
* support chain snapshots

* not support snapshot when domain is shutdowning or dying.
* not support disk-only snapshot [1].
                                       
[1]
 This is different from "libvirt".
 To xl, it only concerns active domains, and even when domain
 is paused, there is no data flush to disk operation. So, take
 a disk-only snapshot and then resume, it is as if the guest
 had crashed. For this reason, disk-only snapshot is meaningless
 to xl. Should not support.

 To libvirt, it has active domains and inactive domains, for
 the active domains, as "xl", it's meaning less to take disk-only
 snapshot, but for inactive domains, disk-only snapshot is valid.
 Should support.

2. Requirements

General Requirements:
* ability to save/restore domain memory
* ability to create/delete/apply disk snapshot [2]
* ability to parse user config file
* ability to save/load/update domain snapshot metadata (or called
  domain snapshot info, the metadata at least includes:
  snapshot name, create time, description, memory state file,
  disk snapshot info, parent (in snapshot chain), current (is
  currently applied))

[2] Disk snapshot requirements:
* external tools: qemu-img, lvcreate, vhd-util, etc.
* For a basic goal, we support 'raw' and 'qcow2' backend types only.
  Then only requires qemu:
    use libxl qmp command (better) or "qemu-img"

3. Interaction with other operations:
Generally, when domain is deleted, all snapshots should be deleted
first.

4. General workflow
Create a snapshot:
  * parse user cfg file if passed in
  * check parameter validation
  * check snapshot operation is allowed
  * save domain, saving memory status to file (refer to: save_domain)
  * take disk snapshot (call qmp command)
  * snapshot chain info:
     - get domain snapshots list (this will retrives all snapshot
       metadata files and returns a list)
     - check if domain is currently on some snapshot, if yes, then
       that snapshot is the 'parent' of our snapshot.
  * save snapshot metadata to json file
    (save/load/retrive snapshot metadata files are similar to
     save/load libxl domain config files.)

Delete a snapshot:
  * get snapshot info (retrieve corresponding snapshot metadata
    file and parse into snapshot info)
  * according to options, get snapshot chain info
    - get domain snapshot list (retrieves all snapshot metadata files
      and returns a list)
    - find parent and children of this snapshot
   * delete this snapshot or this snapshot plus children snapshot
    (according to options)
    - remove memory state file (unlink)
    - delete disk snapshot (call qmp command)
    - update snapshot metadata file of children (if not deleted),
      change 'parent'.
    - delete snapshot metadata file of this snapshot

Revert:
  * get snapshot info (retrieve corresponding snapshot metadata
    file and parse into snapshot info)
  * destroy this domain
  * create a new domain from snapshot info
    - apply disk snapshot (qemu-img)
    - a process like restore domain
  * update snapshot metadata, set 'current'.

List:
  * get snapshot info list (retrieves all snapshot metadata files
    and returns a list)
  * print in certain format according info list

>>> On 11/13/2014 at 07:41 PM, in message <1415878862.21321.9.camel@xxxxxxxxxx>,
Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote: 
> On Wed, 2014-11-12 at 20:07 -0700, Chun Yan Liu wrote: 
> > > > By "active" here, do you you mean "live" (vs paused)?  
> > > Means the domain is started (no matter is running or paused).  
> > > vs (libvirt defines a domain but not started).  
> > > Here,  I should update this to:  
> > > 3). take disk snapshot by qmp command  
> > > libxl only handles active domain.  
>  
> I think the problem here is that different components in the system use 
> different terminology for things or even different concepts (e.g. libxl 
> has no inherent concept of inactive vs active domains, because it only 
> concerns itself with active domains). 
>  
> Perhaps a glossary defining these things would help (also see below). 
>  
> > > > >    libxl_domain_snapshot_delete:   
> > > > >        1). check args validation   
> > > > >        2). remove memory state file.   
> > > > >        3). delete disk snapshot. (for internal disk snapshot, through 
> > > > >  
> qmp   
> > > > >            command or qemu-img command)   
> > > >    
> > > > Out of curiosity, why is this necessary?  Is libxl keeping track of   
> > > > the snapshots somewhere?  Or qemu?   
> > > >    
> > > > Or to put it a different way, since the caller knows the filenames,   
> > > > why can't the caller just erase the files themselves?  
> > >   
> > > Ian asks the same question. The only reason I propose an API is:  
> > > xl and libvirt can share the code. And in future, when support many other 
> > >   
>  
> > > disk  
> > > backend types, there is much repeated code. But as Ian mentioned in  
> > > last version, for handling many disk backend types, maybe better placed 
> > > in  
>  
> > > libxlu. Well, if both of you object, I'll remove this API.  
>  
> I think the reason we are having these same discussions over again is 
> that this proposal is focusing on the libxl API (e.g. the details of 
> what functions exist and what parameters they take) without an 
> introductory section which provides a broad overview of the 
> architecture, containing e.g. things like: 
>  
>       * What the general requirements for domain snapshotting are; 
>       * What are the constraints which we are operating under; e.g. 
>         libvirt or xl design requirements 
>       * What the various components are (and which, possibly multiple, 
>         entities provide them) and where the various responsibilities 
>         lie. 
>  
> I think we've teased a lot of this sort of thing out in past iterations 
> but without having it written down here I think we are all having 
> trouble agreeing (or remembering that we've agreed) that the API makes 
> sense because we all have different ideas about what the higher level 
> architecture/abstraction should look like. 
>  
> See for example 
> http://xenbits.xen.org/people/dvrabel/event-channels-H.pdf or 
> http://lists.xen.org/archives/html/xen-devel/2014-10/msg03235.html (you 
> don't necessarily need to go all out on that level of formality, but 
> they provide some examples of the sorts of higher level design I'm 
> talking about) 
>  
> I think it would also help with the glossary question above since it 
> would help define the terms. 
>  
> I'm sorry for not observing this sooner. 
>  
> Ian. 
>  
>  
> _______________________________________________ 
> Xen-devel mailing list 
> Xen-devel@xxxxxxxxxxxxx 
> http://lists.xen.org/xen-devel 
>  
>  


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.