[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 1/2] libxc: introduce XC_SAVE_ID_TOOLSTACK
On Mon, 30 Jan 2012, Shriram Rajagopalan wrote: > On 2012-01-30, at 6:16 AM, Stefano Stabellini > <stefano.stabellini@xxxxxxxxxxxxx> wrote: > > > On Fri, 27 Jan 2012, Shriram Rajagopalan wrote: > >> If there is an error in applying the toolstack state, then pagebuf_get > >> returns -1 and > >> the rest of the code would still attempt to resume the domain, with > >> possibly > >> inconsistent device model state. > > > > This sounds like an unhandled error in the caller of pagebuf_get. > > Nope. pagebuf_get is used only when Remus is on. > It is called after applying the last memory checkpoint. > So when it returns an error, we simply discard whatever we buffered > and wrap up. For migration we > have a special xc_save_id_last_checkpoint > that terminates the code after > buffering the first pagebuf. > > > I think that if pagebuf_get returns an error, the error should be > > propagated and the migration should be aborted, right? > > After all I don't think we can successfully resume the domain if we fail > > to read XC_SAVE_ID_TSC_INFO, for example. > > I think we need something like this: > > > > > > @@ -1589,9 +1616,10 @@ int xc_domain_restore(xc_interface *xch, int io_fd, > > uint32_t dom, > > > > // DPRINTF("Buffered checkpoint\n"); > > > > - if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) ) { > > + if ( pagebuf_get(xch, ctx, &pagebuf, io_fd, dom) < 0 ) { > > PERROR("error when buffering batch, finishing"); > > - goto finish; > > + goto out; > > } > > memset(&tmptail, 0, sizeof(tmptail)); > > tmptail.ishvm = hvm; > > > > > >> Also, lets say there was no error in applying the toolstack state. If a > >> network > >> error occurs while receiving the next XC_SAVE_ID or so, then again, the > >> code following > >> the above snippet would attempt to resume the domain, with a memory state > >> inconsistent > >> with the device state. > > > > This seems wrong to me, but I am not very familiar with this protocol. > > As I wrote above, why should we continue if pagebuf_get returns an > > error? > > > > > >> The right place to call the callbacks->toolstack_restore would be after > >> the finish_hvm: > >> label, where currently the old QEMU device state is restored. > > > > Are you suggesting that we should read the toolstack data from > > pagebuf_get but only call the callback after finish_hvm? > > Yep. I hope the above explanation made some sense. All right, I'll do that. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |