[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 1/2] libxc: introduce XC_SAVE_ID_TOOLSTACK



On Mon, 30 Jan 2012, Shriram Rajagopalan wrote:
> On 2012-01-30, at 6:16 AM, Stefano Stabellini 
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> 
> > On Fri, 27 Jan 2012, Shriram Rajagopalan wrote:
> >> If there is an error in applying the toolstack state, then pagebuf_get 
> >> returns -1 and
> >> the rest of the code would still attempt to resume the domain, with 
> >> possibly
> >> inconsistent device model state.
> > 
> > This sounds like an unhandled error in the caller of pagebuf_get.
> 
> Nope. pagebuf_get is used only when Remus is on. 
> It is called after applying the last memory checkpoint.
> So when it returns an error, we simply discard whatever we buffered
> and wrap up. For migration we
> have a special xc_save_id_last_checkpoint
> that terminates the code after 
> buffering the first pagebuf.
> 
> > I think that if pagebuf_get returns an error, the error should be
> > propagated and the migration should be aborted, right?
> > After all I don't think we can successfully resume the domain if we fail
> > to read XC_SAVE_ID_TSC_INFO, for example.
> > I think we need something like this:
> > 
> > 
> > @@ -1589,9 +1616,10 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
> > uint32_t dom,
> > 
> >     // DPRINTF("Buffered checkpoint\n");
> > 
> > -    if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) {
> > +    if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) < 0 ) {
> >         PERROR("error when buffering batch, finishing");
> > -        goto finish;
> > +        goto out;
> >     }
> >     memset(&tmptail, 0, sizeof(tmptail));
> >     tmptail.ishvm = hvm;
> > 
> > 
> >> Also, lets say there was no error in applying the toolstack state. If a 
> >> network
> >> error occurs while receiving the next XC_SAVE_ID or so, then again, the 
> >> code following
> >> the above snippet would attempt to resume the domain, with a memory state 
> >> inconsistent
> >> with the device state.
> > 
> > This seems wrong to me, but I am not very familiar with this protocol.
> > As I wrote above, why should we continue if pagebuf_get returns an
> > error?
> > 
> > 
> >> The right place to call the callbacks->toolstack_restore would be after 
> >> the finish_hvm:
> >> label, where currently the old QEMU device state is restored.
> > 
> > Are you suggesting that we should read the toolstack data from
> > pagebuf_get but only call the callback after finish_hvm?
> 
> Yep. I hope the above explanation made some sense.

All right, I'll do that.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.