[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC V9 2/4] domain snapshot overview



On Mon, 2015-01-12 at 00:01 -0700, Chun Yan Liu wrote:
> 
> >>> On 1/8/2015 at 08:26 PM, in message 
> >>> <1420719995.19787.62.camel@xxxxxxxxxx>, Ian
> Campbell <Ian.Campbell@xxxxxxxxxx> wrote: 
> > On Mon, 2014-12-22 at 20:42 -0700, Chun Yan Liu wrote: 
> > >  
> > > >>> On 12/19/2014 at 06:25 PM, in message  
> > <1418984720.20028.15.camel@xxxxxxxxxx>, 
> > > Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:  
> > > > On Thu, 2014-12-18 at 22:45 -0700, Chun Yan Liu wrote:  
> > > > >   
> > > > > >>> On 12/18/2014 at 11:10 PM, in message   
> > > > <1418915443.11882.86.camel@xxxxxxxxxx>,  
> > > > > Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:   
> > > > > > On Tue, 2014-12-16 at 14:32 +0800, Chunyan Liu wrote:   
> > > > > > > Changes to V8:   
> > > > > > >   * add an overview document, so that one can has a overall look  
> > > > > > >  
> > > > > > >     about the whole domain snapshot work, limits, requirements,   
> > > > > > >     how to do, etc.   
> > > > > > >    
> > > > > > > =====================================================================
> > > > > > >    
> > > > > > > Domain snapshot overview   
> > > > > >    
> > > > > > I don't see a similar section for disk snapshots, are you not   
> > > > > > considering those here except as a part of a domain snapshot or is 
> > > > > > this   
> >  
> > > > > > an oversight?   
> > > > > >    
> > > > > > There are three main use cases (that I know of at least) for   
> > > > > > snapshotting like behaviour.   
> > > > > >    
> > > > > > One is as you've mentioned below for "backup", i.e. to preserve the 
> > > > > > VM   
> > > > > > at a certain point in time in order to be able to roll back to it. 
> > > > > > Is   
> > > > > > this the only usecase you are considering?   
> > > > >   
> > > > > Yes. I didn't take disk snapshot thing into the scope.  
> > > > >   
> > > > > >    
> > > > > > A second use case is to support "gold image" type deployments, i.e. 
> > > > > >   
> > > > > > where you create one baseline single disk image and then clone it   
> > > > > > multiple times to deploy lots of guests. I think this is usually a 
> > > > > > "disk  
> >   
> > > > > > snapshot" type thing, but maybe it can be implemented as restoring 
> > > > > > a   
> > > > > > gold domain snapshot multiple times (e.g. for start of day 
> > > > > > performance   
> > > > > > reasons).   
> > > > >   
> > > > > As we initially discussed about the thing, disk snapshot thing can be 
> > > > >  
> > done  
> > > > > be existing tools directly like qemu-img, vhd-util.  
> > > >   
> > > > I was reading this section as a more generic overview of snapshotting,  
> > > > without reference to where/how things might ultimately be implemented.  
> > > >   
> > > > From a design point of view it would be useful to cover the various use 
> > > >  
> > > > cases, even if the solution is that the user implements them using CLI  
> > > > tools by hand (xl) or the toolstack does it for them internally  
> > > > (libvirt).  
> > > >   
> > > > This way we can more clearly see the full picture, which allows us to  
> > > > validate that we are making the right choices about what goes where.  
> > >  
> > > OK. I see. I think this user case is more like how to use the snapshot,  
> > rather 
> > > than how to implement snapshot. Right? 
> >  
> > Correct, what the user is actually trying to achieve with the 
> > functionality. 
> >  
> > > 'Gold image' or 'Gold domain', the needed work is more like cloning 
> > > disks. 
> >  
> > Yes, or resuming multiple times. 
> 
> I see. But IMO it doesn't need change in snapshot design and implementation.
> Even resuming multiple times, they couldn't use the same image but duplicate
> the image multiple times.

Perhaps, but the use case should be included so that this rationale for
not worrying about it can be written down (so that people like me don't
keep asking...) 

> 
> >  
> > > > > > The third case, (which is similar to the first), is taking a disk   
> > > > > > snapshot in order to be able to run you usual backup software on 
> > > > > > the   
> > > > > > snapshot (which is now unchanging, which is handy) and then 
> > > > > > deleting the  
> >   
> > > > > > disk snapshot (this differs from the first case in which disk is 
> > > > > > active   
> >  
> > > > > > after the snapshot, and due to the lack of the memory part).   
> > > > >   
> > > > > Sorry, I'm still not quite clear about what this user case wants to 
> > > > > do.  
> > > >   
> > > > The user has an active domain which they want to backup, but backup  
> > > > software often does not cope well if the data is changing under its  
> > > > feet.  
> > > >   
> > > > So the users wants to take a snapshot of the domains disks while 
> > > > leaving  
> > > > the domain running, so they can backup that static version of the disk  
> > > > out of band from the VM itself (e.g. by attaching it to a separate  
> > > > backup VM).  
> > >  
> > > Got it. So that's simply disk-only snapshot when domian is active. As you 
> > > mentioned below, that needs guest agent to quiesce the disks. But 
> > > currently 
> > > xen hypervisor can't support that, right? 
> >  
> > I don't think that's relevant right now, let me explain: 
> >  
> > I think it's important to consider all the use cases for snapshotting, 
> > not because I think they need to be implemented now but to make sure 
> > that we don't make any design decisions now which would make it 
> > *impossible* to implement it in the future (at least without API 
> > changes). 
> >  
> > As a random example, we would want to avoid designing a libxl API where 
> > it is impossible to send the quiesce request at the right point for some 
> > reason. 
> >  
> > So we need to consider these use cases now and have the design, but not 
> > necessarily the implementation, be able to deal with them, or at least 
> > to convince ourselves we most likely aren't tying our hands for future 
> > work.
> 
> Understand. If this user case is included in design, then I think
> libxl_disk_snapshot_create is not enough, better to have
> libxl_domain_snapshot_create, in future if guest agent is implemented,
> the process would be:
> if 'disk-only':
>    pause domain;
>    drain cache data to disk;
>    take disk snapshot;
>    resume domain;
> else:
>    save memory;
>    take disk snapshot;
>    resume domain.

You don't mention the poking of the agent here, I think it comes before
the if, or maybe just after in the disk-only case (it can't come after
the pause)?

It might be that a mechanism for quiescing all disks + pausing would be
sufficient, meaning you could leave libxl_disk_snapshot_create as it is
and implement the snapshot for backup similarly as quiesce+pause +
libxl_disk_snapshot create.

Either way by considering this usecase now we can decide whether
libxl_disk_snapshot_create is sufficient or whether we should go with
lixbxl_domain_snapshot_create from day one.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.